[pull] dev from ArchiveBox:dev#1
Open
pull[bot] wants to merge 4557 commits into
Open
Conversation
| 'youtube_dl', | ||
| ], capture_output=True, text=True, cwd=out_dir).stdout.split('Location: ')[-1].split('\n', 1)[0] | ||
| NEW_YOUTUBEDL_BINARY = Path(pkg_path) / 'youtube_dl' / '__main__.py' | ||
| os.chmod(NEW_YOUTUBEDL_BINARY, 0o777) |
Check failure
Code scanning / CodeQL
Overly permissive file permissions
| if PUBLIC_INDEX: | ||
| return redirect('/public') | ||
|
|
||
| return redirect(f'/admin/login/?next={request.path}') |
Check warning
Code scanning / CodeQL
URL redirection from remote source
|
|
||
| def get(self, request, path): | ||
| if not request.user.is_authenticated and not PUBLIC_SNAPSHOTS: | ||
| return redirect(f'/admin/login/?next={request.path}') |
Check warning
Code scanning / CodeQL
URL redirection from remote source
|
|
||
| # missing trailing slash -> redirect to index | ||
| if '/' not in path: | ||
| return redirect(f'{path}/index.html') |
Check warning
Code scanning / CodeQL
URL redirection from remote source
| response = super().get(*args, **kwargs) | ||
| return response | ||
| else: | ||
| return redirect(f'/admin/login/?next={self.request.path}') |
Check warning
Code scanning / CodeQL
URL redirection from remote source
|
|
||
| def add_view(self, request): | ||
| if not request.user.is_authenticated: | ||
| return redirect(f'/admin/login/?next={request.path}') |
Check warning
Code scanning / CodeQL
URL redirection from remote source
…l limits, redirect to abx-plugins) This rewrite (now reapplied on top of the wiki subtree) covers the full session's work on Configuration.md: - Add crawl/snapshot limits (CRAWL_MAX_URLS/SIZE/TIMEOUT, CRAWL_MAX_CONCURRENT_SNAPSHOTS, SNAPSHOT_MAX_SIZE), DELETE_AFTER, PERMISSIONS, PLUGINS/ENABLED_PLUGINS/ACTIVE_PERSONA. - Add new Database Settings section (SQLITE_* tuning + DATABASE_NAME). - Add SERVER_SECURITY_MODE deep-dive (4 modes, host-layout table). - Add Storage path overrides (DATA_DIR, ARCHIVE_DIR, USERS_DIR, PERSONAS_DIR, CRAWL_DIR, SNAP_DIR, ALLOW_NO_UNIX_SOCKETS). - Remove ALLOWED_HOSTS + CSRF_TRUSTED_ORIGINS as user-settable; both auto-derived from BASE_URL + SERVER_SECURITY_MODE. Backward-compat anchors preserved on BASE_URL with the 0.7.3 -> 0.9 legacy upgrade note. - Remove the entire Plugin Settings tree (~200 options, 41 subsections); replace with prominent redirect to https://archivebox.github.io/abx-plugins/ and a "shared core options that plugins fall back to" table. - Add 231 backward-compat <a id="..."></a> anchors so old URLs to plugin sections / removed options / multi-option headers all still resolve (e.g. #wget_args -> Plugin Configuration section, #public_snapshots -> PERMISSIONS, #ssl_enabled -> Plugin Configuration, #admin_username -> ADMIN_USERNAME/PASSWORD heading, #dir_output_permissions -> OUTPUT_PERMISSIONS, #url_blacklist -> URL_DENYLIST). - Fix wrong default: PUBLIC_ADD_VIEW is False, not True. - Drop the 7 TRAFILATURA_OUTPUT_* per-format flags (replaced by single TRAFILATURA_OUTPUT_FORMATS in plugin); SSL_ENABLED/SSL_TIMEOUT (wrong plugin namespace) — anchors redirected to Plugin Configuration. - Reframe COOKIES_FILE as low-level escape hatch; personas are the preferred auth path. - Link every named plugin to its specific anchor on the abx-plugins page (e.g. WGET_TIMEOUT -> #wget, SONIC_HOST -> #search_backend_sonic). - Strip implementation-detail mentions (Pydantic, etc.). - Slim Shell Options to only user-settable (DEBUG, USE_COLOR, SHOW_PROGRESS); drop IS_TTY/IN_DOCKER/IN_QEMU. - Restructure: General -> Server (+LDAP) -> Storage -> Database (new) -> Search -> Shell -> Plugin Configuration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ugin selector ENABLED_PLUGINS and PLUGINS were two near-identical config keys: PLUGINS was the CLI/per-run whitelist (--plugins flag, runner config), while ENABLED_PLUGINS was the UI/API "persisted enabled set" — but both ended up steering the same plugin resolution. Consolidating on PLUGINS as the single source of truth. - archivebox/config/common.py: drop the ENABLED_PLUGINS Field entirely (no alias, no compat shim — the migration is one-shot). - archivebox/hooks.py:get_enabled_plugins(): read PLUGINS instead of ENABLED_PLUGINS. Function name kept (describes the return value). - archivebox/templates/core/add.html: admin "Add" form JS now writes to PLUGINS; help text updated to reference PLUGINS. views.py:1302 and runner.py:585 already read/wrote PLUGINS; they're now consistent with the resolver. abx-dl is unaffected — it receives selected_plugins as a Python list argument and never reads either config key. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Several inaccuracies + over-documentation cleaned up in one pass:
- ONLY_NEW: completely rewrite. The old prose ("ArchiveBox will never
re-download sites that have already succeeded previously") was carried
over from 0.7.x and is wrong in 0.9.x — setting ONLY_NEW=False (or
--no-only-new) explicitly creates a new Snapshot and re-runs every
extractor. Now describes the actual behavior: skip URL entirely vs.
create a new Snapshot for it.
- CRAWL_MAX_CONCURRENT_SNAPSHOTS: fix the "each concurrent Snapshot
launches its own Chrome instance" claim. Chrome is crawl-scoped by
default (CHROME_ISOLATION="crawl") — concurrent Snapshots share the
crawl's Chrome via tabs, not separate browser processes.
- BASE_URL: drop the "admin.admin.admin.<host> compounding bug"
reference. Config docs shouldn't explain legacy bugs.
- Remove derived/runtime-only options that are NOT user-settable:
ACTIVE_PERSONA (set by persona resolver), CRAWL_DIR/SNAP_DIR (injected
by orchestrator per-call), DATA_DIR (derived from cwd), ARCHIVE_DIR
(derived from DATA_DIR/archive), USERS_DIR (derived from ARCHIVE_DIR),
PERSONAS_DIR (derived from DATA_DIR), LIB_BIN_DIR (tracks LIB_DIR),
DATABASE_NAME (derived from DATA_DIR/index.sqlite3). Backward-compat
<a id="..."></a> anchors preserved for all of them above the nearest
surviving heading so external links still resolve.
- LIB_DIR: fix default path. The doc claimed "<DATA_DIR>/lib/<arch>-<os>"
but constants.py:117 uses platformdirs.user_config_path("abx") / "lib"
— the XDG user-config dir, not inside the data folder. Updated to the
actual default.
- ENABLED_PLUGINS section dropped (option removed in a separate commit);
anchor redirected to PLUGINS.
- Drop the "Pydantic config" implementation-detail mention in PUID/PGID.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…p default to 50 admin_snapshots.py:571 had min(max(50, SNAPSHOTS_PER_PAGE), 500), and admin_archiveresults.py:501 had min(max(5, SNAPSHOTS_PER_PAGE), 5000). Both clamps silently overrode the configured value — a documented default of 40 was inaccessible in the Snapshot admin, and the ArchiveResult admin also reused the same setting without being mentioned in the docs. - Drop both clamps; admin changelists now use SNAPSHOTS_PER_PAGE as-is. - Bump the default in common.py from 40 to 50 (matches what users were actually seeing in the admin under the old floor). - Add ge=1 validation so non-positive values are rejected at config parse time instead of producing broken pagination. - Update Configuration.md: new default 50, clarify the option drives both Snapshot and ArchiveResult admin changelists plus the public index, and that it must be >= 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sweep of all prose doc pages to fix references that were stale, wrong, or pointed at anchors/options that no longer exist in 0.9.x. Critical (non-functional examples + factual errors): - All `PUBLIC_SNAPSHOTS=...` examples (Security-Overview, Publishing- Your-Archive, Usage) replaced with `PERMISSIONS=public|private`. - Setting-up-Authentication: drop the "edit CSRF_TRUSTED_ORIGINS in archivebox/core/settings.py source" advice (no longer user-settable); update auth-permissions list to use PERMISSIONS instead of PUBLIC_SNAPSHOTS. - Security-Overview: SAVE_ARCHIVE_DOT_ORG (with extra underscores) was never real; use ARCHIVEDOTORG_ENABLED. - Docker/Install/Usage: FETCH_TITLE/FETCH_SCREENSHOT/FETCH_PDF/FETCH_DOM were never aliases (only FETCH_MEDIA is); replace with real <PLUGIN>_ENABLED. - Troubleshooting: CHROME_BINARY default is `chromium`, not `chromium-browser`. Also fixed deprecated `brew cask upgrade chromium-browser` -> `brew upgrade --cask chromium`. - Docker: typo MAX_MEDIA_SIZE -> MEDIA_MAX_SIZE. Broken Configuration anchors (must be lowercase on GitHub wiki): - Security-Overview: #FOOTER_INFO / #OUTPUT_PERMISSIONS / #COOKIES_FILE -> lowercase. - Setting-up-Authentication: combined #public_index--public_snapshots--public_add_view -> individual #public_index / #public_add_view / #permissions. Plugin option references now link to abx-plugins: - CHROME_USER_DATA_DIR / CHROME_BINARY / CHROME_SANDBOX -> /#chrome - RIPGREP_BINARY -> /#search_backend_ripgrep - WGET_ENABLED / DOM_ENABLED / SAVE_WGET / SAVE_DOM -> respective anchors - ARCHIVEDOTORG_ENABLED -> /#archivedotorg - FAVICON_PROVIDER / FAVICON_ENABLED -> /#favicon - MEDIA_ENABLED -> /#media Legacy aliases: - Scheduled-Archiving: URL_WHITELIST/URL_BLACKLIST -> URL_ALLOWLIST/ URL_DENYLIST; dropped non-existent `--overwrite` schedule flag. Dead source links removed: - Usage: archivebox/main.py + archivebox/config.py (split to cli/ and config/common.py). - Security-Overview: archivebox/extractors/*.py -> plugin anchors. - Install: dead Configuration#dependency-options and Configuration#archive-method-toggles anchors -> abx-plugins reference. Typo fixes (codespell): - preferrably -> preferably, necesary -> necessary, Rasberry -> Raspberry, sytem -> system, Dissallow -> Disallow, whats -> what's, filesytem -> filesystem. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Clean rebuild of docs/apidocs/archivebox/ via autodoc2: - Removes 19 stale module pages whose source files no longer exist (cli_utils, host_utils, schedule_utils — renamed to *_util; actors, ideas, debugging, folders, legacy, progress_layout, tests_piping, config_tags, personas.runtime/views, orchestrator*, worker, tasks). - Adds 29 new module pages for code that was added since the previous generation but not yet documented. - Updates 100 existing pages to reflect API surface changes (e.g. ENABLED_PLUGINS Field removed, SNAPSHOTS_PER_PAGE clamp removed, default bumped to 50, etc.). 156 source modules <-> 156 apidoc files (zero drift). Build clean under sphinx-build -W --keep-going. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )