Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
25 KiB
pkg.go.dev API migration notes
Goal
This migration replaces the old metadata discovery path based on GitHub API calls, ?go-get=1 HTML parsing, and golang.org/x/tools/go/vcs with the official pkg.go.dev API (https://pkg.go.dev/v1beta) plus Go module proxy fallbacks.
The main reason is that RPM spec generation should follow Go module semantics, not just Git repository layout. pkg.go.dev already resolves module paths, versions, README content, SPDX license metadata, repository URLs, and ambiguous import paths in the same ecosystem used by go get and go mod.
Code changes
- Added
pkgsite.goas the single pkg.go.dev API client. - Added
pkgsite_test.gocovering API error retry, ambiguity handling, summary cleanup, license expression generation, module proxy escaping, Source0 generation, and type detection. - Removed GitHub API client setup and credential handling from
main.go. - Removed
github.com/google/go-github/v60and deprecatedgolang.org/x/tools/go/vcsdependencies fromgo.mod. - Replaced GitHub README/license/repository lookups in
metadata.goanddescription.gowith pkgsite module/package metadata. - Reworked
pack.goto use pkgsite module path, package name, version, and repository URL. - Reworked
spec.gosoURLandSource0come from pkgsite/repository/module-proxy information rather than hard-coded GitHub owner/repo parsing.
Change classification
API and data-source changes
These changes replace old discovery sources with Go ecosystem sources, without being output features by themselves:
- Added the pkg.go.dev v1beta client with retry handling, response-size cap, in-process metadata cache, and the HTTP cache transport wired in
main.go. - Removed the GitHub API client, credential/token handling, GitHub README/license lookups,
?go-get=1HTML scraping, andgolang.org/x/tools/go/vcsrepo-root guessing. - Replaced old metadata sources with pkgsite module/package metadata for module path, package name, repo URL, README, synopsis, license, version, imports, and ambiguity candidates.
- Added Go module proxy zip URL support and module-proxy path escaping;
.modparsing is still future work. - Replaced
vcs.VCS.Tagswith localgit tag --listover the cloned repository, and replaced VCS abstraction checkout with directgit clone/git checkout. - Replaced the flattened
go list -fdependency scan with structuredgo list -e -json. - Removed
go-github,x/tools/go/vcs, and obsolete transitive dependencies fromgo.mod/go.sum.
Feature enhancements independent of the API migration
These are output, policy, or robustness improvements that could exist with any metadata source:
- Preserve the user-requested import path for package naming and
%define go_import_path, while using the resolved module path for source checkout and tarball generation. - Preserve
/vNsemantic-import-version suffixes in generated RPM names. - Add RPM-safe summary cleanup helpers that strip markup, badges, README headings, and Go doc
Package ...prefixes. - Split dependencies into normal imports, test-only imports, and their union;
BuildRequiresuses the union, while library runtimeRequiresuses normal imports only. This still depends on a heuristic import-path-to-RPM conversion, so resolving imports to owning modules remains future work. - Derive runtime/build dependency sets inside
writeSpecinstead of passing a flattened dependency list. - Fix the missing
gopkgargument inwriteRPMLibrarySubpackage. - Fall back from failed hoster tarball downloads to local
git archiveinstead of failing immediately. - Add
go.yaml.in -> yamlto the short host-name mapping. - Add migration notes and regression tests for retry/error handling, ambiguity, summary cleanup, license expressions, proxy escaping, Source0 generation, type detection, dependency splitting, and spec asset paths.
Feature enhancements enabled by the new API/data sources
These user-visible improvements depend on pkgsite, module-proxy, or structured Go metadata:
- Generate real
License:values from pkgsite SPDX license types, including multi-license SPDX expressions. - Generate real
URL:values for non-GitHub modules and normalizecs.opensource.google/go/x/*togo.googlesource.com/*. - Generate
Summary:from pkgsite package synopsis, with pkgsite README fallback and cleanup. - Generate long
%descriptionfrom pkgsitemodule.readme.contents. - Prefer pkgsite's latest module version, then validate it against local Git tags and check out the selected release tag before dependency/tarball discovery.
- Resolve module paths and ambiguous import paths using pkgsite candidates instead of repository-root guessing.
- Use module-proxy
Source0zips for monorepo submodules and unsupported hosters while keeping hoster tarballs as the primary source for normal repo-root modules. - Refuse invalid module-proxy
Source0for commit-pinned subdirectory modules by leavingSource0: TODOunless a canonical pseudo-version is available. - Use pkgsite README and license file paths to emit exact
%doc/%licenseentries such asReadmeorLicense, and omit%docwhen no README exists. - Classify program/library type from pkgsite
Package.Name == "main"; localmaindiscovery remains only as a fallback.
Future enhancements that depend on the same data-source migration include per-package Provides from /v1beta/packages/{module}, cleaner BuildRequires from module proxy .mod files, cgo detection from imports, multi-module repository fan-out, and import-path-to-owning-module resolution.
Fields improved by official API data
| Field | Old behavior | New behavior |
|---|---|---|
| Module path | vcs.RepoRootForImportPath / repo root guessing |
package.modulePath and ambiguity candidates from pkgsite |
| Version | git describe over repository tags |
pkgsite latest module version is preferred, then validated against Git tags |
| Summary | GitHub repo description or README fallback | pkgsite package synopsis, cleaned for RPM Summary |
| License | GitHub license API or TODO |
pkgsite SPDX license types |
| Repository URL | GitHub-only parsing / TODO for non-GitHub |
pkgsite module.repoUrl; cs.opensource.google/go/x/* is normalized to go.googlesource.com/* |
| Source0 | Hoster table only | Hoster table plus proxy.golang.org zip fallback |
| Type guess | Any package main in repo meant program |
Requested package name == "main" means program; command subpackages no longer misclassify libraries. If pkgsite reports no package name, local main detection remains a fallback |
License handling
The old githubLicenseToSPDXLicense path converted GitHub license API keys such as mit or apache-2.0 into local SPDX strings. That local conversion is intentionally removed because the new metadata source already returns SPDX license identifiers.
Current behavior:
getLicenseForGopkgreadsmodule.licensesfrom pkgsite and falls back topackage.licensesonly if the module has no license data.pkgsiteLicenseExpressionuseslicenses[].typesexactly as returned by pkgsite. It does not translate GitHub keys or infer license names from filenames.- If one license file has multiple pkgsite-detected types, they are joined with
OR, for example(MIT OR OFL-1.1). - If multiple license files apply, the per-file groups are joined with
AND. - Top-level license files are preferred over nested license files, because nested files often belong to vendored or generated content.
- If pkgsite returns no license types, the generated
License:field staysTODOinstead of guessing. %licensefile entries use pkgsitelicenses[].filePath, sanitized for spec output, so casing such asLicenseis preserved.
This means License: correctness now depends on pkg.go.dev's official license detection data, not on GitHub's license API or a local GitHub-key-to-SPDX mapping. The generator still owns only the RPM expression policy: top-level preference, per-file OR, cross-file AND, duplicate removal, stable sorting, and TODO when the API does not provide SPDX types.
Important behavior changes
/vNmodule paths are preserved in generated package names, for examplegithub.com/alecthomas/chroma/v2becomesgo-github-alecthomas-chroma-v2. This is intentional because pkgsite reports the module path, not only the Git repository root.- Summary differences versus the original tool are expected. The new summaries come from pkgsite package synopsis or cleaned README text.
golang.org/x/*URLs now point atgo.googlesource.com/*and Source0 usesproxy.golang.org.- Monorepo submodules such as
github.com/charmbracelet/x/ansiuse module-proxy Source0 so the source is scoped to that module instead of the whole monorepo tarball. - Root libraries with command subpackages, such as
go.yaml.in/yaml/v4, are treated as libraries by default. Use-typeto override when a package should be generated as a program or combined package. - Requested subpackage paths are preserved for package naming and
%define go_import_path; the module root is used only for source checkout and tarball generation. - Dependency discovery now keeps normal imports separate from
TestImportsandXTestImports.BuildRequiresuses the union, while library runtimeRequiresuses normal imports only. - Commit-pinned subdirectory modules no longer emit invalid module-proxy
Source0URLs; a canonical pseudo-version is required before the proxy can serve those sources, otherwiseSource0is left asTODOfor manual handling.
Dependency data
The old implementation did not run a full compile to discover dependencies. It ran:
go list -e -f '{{join .Imports "\n"}}
{{join .TestImports "\n"}}
{{join .XTestImports "\n"}}' <repo>/...
Then it removed imports that are part of the packaged repository and imports from the Go standard library. The remaining import paths were all written as RPM BuildRequires. This means regular compile imports, internal test imports, and external test-package imports were already included, but they were merged into one set and emitted only as BuildRequires; no %check or test-only bucket existed.
The pkg.go.dev API has package-level dependency data, but it is not a complete module-level RPM dependency source:
/v1beta/package/{path}?imports=truereturns the direct imports for one package. These are package import paths, not module paths or RPM package names./v1beta/packages/{module}enumerates packages in a module, but does not return the dependency graph by itself./v1beta/module/{module}exposes module metadata such as version and repository URL. Although the OpenAPI schema contains agoModContentsfield, the field is currently empty or partial inv1betaresponses; do not use it. Fetchhttps://proxy.golang.org/<module>/@v/<version>.moddirectly instead.
For cleaner BuildRequires, the preferred future approach is to fetch the official module proxy .mod file for the selected version and parse it with golang.org/x/mod/modfile. Direct require entries can represent normal module dependencies, while // indirect entries can be handled separately or ignored depending on RPM policy. Package-level pkgsite imports=true can still be useful for cgo detection, identifying direct package imports, and resolving import paths back to modules when needed.
Test-only dependencies are now tracked separately by reading structured go list -json output. The generator stores normal imports, test-only imports, and their union separately. Today the union is emitted as BuildRequires, and only normal imports are emitted as library runtime Requires. Because import paths are still collapsed to RPM provides by a heuristic, a test-only import that shares the same top-level provide as a runtime import can still appear in runtime Requires; resolving imports to owning module paths is a follow-up. A future RPM-macro-specific enhancement can emit the test-only set as %check-scoped dependencies or another separate generated list.
If a dependency is needed only by tests but does not appear in the generated spec, the likely causes are:
go list -e -jsonmay return partial package data when a package cannot be loaded. The current code logs package load errors; these diagnostics still need policy decisions for when generation should fail.- Only the default build context is analyzed. Test files behind non-default build tags, GOOS, GOARCH, cgo settings, or downstream RPM-specific tags are not visible unless those settings are passed to
go list. passthroughEnvcurrently preserves only a small environment allowlist. It does not pass variables such asGOFLAGS,GOOS,GOARCH,CGO_ENABLED,GOPROXY,GONOSUMDB, orGOPRIVATE, so local settings that would make additional test files or private/proxy dependencies visible can be lost.- Imports equal to the packaged repo path or prefixed by
<repo>/are dropped as internal packages. This is correct for normal subpackages, but can be wrong for monorepos or nested modules where a same-prefix import is actually a separate packaged module. convertDependenciesToRPMcollapses import paths to a top-level path such asgithub.com/user/repo, with only a simple/vNexception. Nested modules, vanity paths, and monorepo submodules can therefore be converted to the wrong RPM virtual provide, making a dependency look missing.- Test cgo imports are represented as
C, and the current code explicitly ignoresC, so C compiler, pkg-config, and C library requirements used only by tests are not derived. - Generated test files are not considered because
go listdoes not rungo generate.
The safer design is to keep dependency provenance all the way through generation: normal imports, in-package test imports, external test-package imports, cgo usage, and module require data should be stored as separate sets. The spec writer can then emit ordinary build dependencies and test-only dependencies independently, instead of filtering a flattened import list after the fact.
Examples verified
go.yaml.in/yaml/v4
pkgsite reports the root package name as yaml, while cmd/go-yaml is a separate main subpackage. This package is not part of the go.mod comparison set; it was manually run as a regression check for library/program detection. The generated spec is now a library package:
Name: go-yaml-yaml-v4
Version: 4.0.0~rc4
Summary: Implements YAML 1.1/1.2 encoding and decoding for Go programs
License: Apache-2.0
BuildArch: noarch
BuildSystem: golangmodules
Provides: go(go.yaml.in/yaml/v4) = %{version}
golang.org/x/net
Old output used TODO metadata. New output uses official metadata and module proxy Source0:
License: BSD-3-Clause
URL: https://go.googlesource.com/net
Source0: https://proxy.golang.org/golang.org/x/net/@v/v%{version}.zip#/%{_name}-%{version}.zip
github.com/charmbracelet/x/ansi
This is a submodule inside the github.com/charmbracelet/x monorepo. Source0 now points at the module proxy zip:
Name: go-github-charmbracelet-x-ansi
Source0: https://proxy.golang.org/github.com/charmbracelet/x/ansi/@v/v%{version}.zip#/%{_name}-%{version}.zip
Dependency split validation
Recent manual validation confirmed that test/build-only imports are excluded from library runtime Requires:
github.com/charmbracelet/lipgloss:go(github.com/aymanbagabas/go-udiff)appears inBuildRequiresonly.github.com/aliyun/alibabacloud-oss-go-sdk-v2:go(github.com/stretchr/testify)appears inBuildRequiresonly.google.golang.org/grpc:go(google.golang.org/protobuf/testing)appears inBuildRequiresonly.
Final comparison artifacts
The latest full comparison run is:
comparison-runs/go-mod-all-final-20260522184105/
Important files:
summary.txt- readable per-package comparison.fields-full.tsv- full field-level data for original/current output.diff-summary-full.tsv- compact list of differing fields.
This run used all 27 require entries from go.mod.
Headline result:
- Current implementation: 27/27 packages generated successfully.
- Current generated specs: no
Summary,License,URL, orSource0field remainsTODO. - Original implementation: 2 packages failed in the comparison baseline.
Notable improvements:
golang.org/x/*packages now have real URL, License, and Source0 fields.github.com/charmbracelet/glamourandgithub.com/charmbracelet/x/exp/slicegenerated successfully in the current implementation.- Multi-license modules such as
github.com/alecthomas/chroma/v2now get SPDX expressions like(MIT OR OFL-1.1).
Older comparison-runs/ directories are intermediate states from the migration and review process. Use go-mod-all-final-20260522184105 as the current reference.
Additional ad-hoc validation after the comparison run covered github.com/charmbracelet/bubbletea, github.com/charmbracelet/lipgloss, github.com/go-openapi/testify, github.com/hashicorp/go-secure-stdlib, github.com/aliyun/alibabacloud-oss-go-sdk-v2, github.com/aymerick/douceur, google.golang.org/grpc, github.com/apache/arrow, github.com/aws/smithy-go, and github.com/charmbracelet/x. These runs verified the dependency split for normal module cases and exposed the legacy GOPATH / package-collection cases tracked below.
Verification commands
go test ./...
go run . pack go.yaml.in/yaml/v4
go run . pack golang.org/x/net
go run . pack github.com/charmbracelet/x/ansi
Caveats
- pkg.go.dev API is currently
v1beta; schema and behavior may change. - pkg.go.dev only covers public modules. Private modules still need a different path.
- Newly tagged versions may have pkgsite indexing delay.
pkgsiteInfoCacheis in-process only and not version-keyed yet, so requesting two versions of the same module in one process can reuse the first cached result.- There is also an in-memory HTTP cache transport for pkgsite requests, so cache behavior has two process-local layers.
- pkgsite response bodies are capped by
pkgsiteMaxResponseBytes. - pkgsite requests currently retry up to 3 times with fixed linear sleeps on transport errors, HTTP 429, or HTTP 5xx responses.
- Module-proxy zip Source0 archives unpack as module-version paths such as
module@vX.Y.Z; generated specs may still need%setup/%autosetupadjustments depending on downstream RPM macros. - Subdirectory modules still need an end-to-end checkout/layout review. The spec
Source0may use module-proxy zip correctly, but dependency discovery currently clones the repository and may not analyze the actual submodule directory layout. - For requested subpath programs,
%define _namefollows the requested path while hoster tarballs usually unpack using the repository or module root name. This can require%setup/%autosetup -nhandling and needs a dedicated fix before declaring subpath program output fully buildable.
Possible split-out work: multi-module and legacy repositories
This section is not part of the current implementation plan. It records cases that may need a separate design pass after the pkgsite migration and dependency-splitting behavior are validated.
github.com/hashicorp/go-secure-stdlib is a package-collection repository, not a normal single Go module. pkgsite reports the repository root with hasGoMod: false, while subpaths such as github.com/hashicorp/go-secure-stdlib/base62 are separate modules with their own go.mod, versions, tags, dependencies, and potentially licenses. /v1beta/packages/github.com/hashicorp/go-secure-stdlib is module-scoped, so it returns only packages in the root pseudo-module, not sibling modules.
hasGoMod: false alone is not enough to classify a repository as a package collection. github.com/aymerick/douceur is a legacy GOPATH-style repository: pkgsite reports hasGoMod: false, but /v1beta/packages returns several importable packages under the same root rather than separate sibling modules. These repositories need GOPATH-mode dependency discovery, not fan-out into submodules.
Today, hasGoMod: false repositories can generate silently incomplete dependencies because go list -json runs in module-mode defaults and may match no packages. This was observed with github.com/aymerick/douceur and github.com/apache/arrow; GOPATH-mode discovery or a refusal policy is needed in the split-out work.
The generator should distinguish three shapes:
- Single module with many packages, such as
golang.org/x/text: one module path, one version, one dependency closure. Use/v1beta/packages/{module}to emitProvides: go(<pkgpath>) = %{version}for each redistributable package in the module, and keepgo list -json ./...for dependency discovery. - Legacy GOPATH repository, such as
github.com/aymerick/douceur: nogo.mod, but one repository-level package set. Treat it like a single package set, but run dependency discovery in GOPATH mode and still emit per-packageProvides. - Multi-module repository / package collection, such as
github.com/hashicorp/go-secure-stdliborgithub.com/charmbracelet/x: multiple subdirectories are independent modules. Do not fold them into one source RPM, because versions, Source0 paths, dependencies, and licenses can diverge.
Default behavior for confirmed package-collection roots should be refusal with a clear diagnostic, not generation of a mostly empty root spec. The diagnostic should explain that sibling modules were detected and suggest packaging a concrete submodule path, for example github.com/hashicorp/go-secure-stdlib/base62. An opt-in --fanout or --submodules=a,b,c mode can later generate one independent spec per discovered submodule.
Currently, packaging a bare package-collection root such as github.com/charmbracelet/x fails at pkgsite lookup time with "not found"; the clearer diagnostic described here is not implemented.
Submodule discovery should avoid GitHub-specific APIs. Preferred sources are git ls-remote --tags <repoUrl> grouped by tags like <subdir>/vX.Y.Z, followed by pkgsite /v1beta/module/{repo}/{subdir} probes to keep only paths with hasGoMod: true. Each discovered submodule must run the normal single-module pipeline independently: resolve its pkgsite version, Source0, license, package list, normal/test dependencies, and spec name.
This feature depends on two dependency fixes:
- Resolve import paths to owning module paths before converting them to RPM virtual provides. For example,
github.com/hashicorp/go-secure-stdlib/base62must map to the submodule RPM name, not the repository-root name. - Compare internal imports against the module path being packaged, not just the repository prefix. Sibling modules share a repository prefix but are external dependencies from Go module and RPM perspectives.
Follow-up work
- Use the documented
/v1beta/packages/{module}endpoint to enumerate all packages in a module and improvelibrary+program/program+librarygeneration. - Validate and fix subdirectory module checkout/layout so dependency discovery analyzes the module subdirectory while
Source0remains scoped to the module. - Validate and fix requested subpath program specs so
_name,Source0, and%setupagree on the extracted source directory. - Use Go module proxy
.modfiles andgolang.org/x/mod/modfileto derive direct module dependencies for cleanerBuildRequires. - Emit the tracked test-only dependency set through RPM-macro-specific
%checkdependency mechanisms if the target macro set supports it. - Use pkgsite imports or source scanning to detect cgo and improve
ExclusiveArch/ C toolchain BuildRequires. - Improve Source0 and
%setupsemantics for module-proxy zip sources. - Improve retry behavior by honoring
Retry-After, adding jitter, and using exponential backoff. - Add fixture tests for real pkgsite API responses to catch future
v1betaschema drift.
Dependency follow-up plan
When implementing dependency handling, keep these work items together so the known gaps above are not reintroduced:
- Decide which
go list -jsonpackage load errors should be fatal instead of warnings, so partial results cannot silently hide dependencies. - Make the analyzed build context explicit: preserve or configure
GOFLAGS, build tags,GOOS,GOARCH,CGO_ENABLED, proxy/private-module settings, and any required code generation before dependency discovery. - Resolve import paths to module paths before converting them to RPM virtual provides. Multi-module same-prefix filtering is tracked separately in the split-out repository-shape work above.
- Track cgo separately, including test-only
import "C", so C compiler, pkg-config, and C library requirements can be emitted when tests need them. - Emit the already-separated test-only dependency set through target-specific
%checkdependency macros when available. - Add regression fixtures for external test packages, build-tagged tests, cgo tests, generated test files, and vanity paths. Nested-module and monorepo same-prefix fixtures belong with the split-out repository-shape work above.