Instead of 'loader.debug_type', introduce 'loader.log_level'
and 'loader.log_file', along with a set of definitions for
logging at a chosen level.
For now, the call sites keep using the legacy macros (SGX_DBG and
debug()), because converting them all will conflict with other
big changes in the code base. The existing LibOS calls are
assumed to be at 'info' level.
This is the next part of the great loader rework, with a lot of breaking changes:
- Complete removal of the "trusted children" thing - now children
processes can be spawned arbitrarily and from arbitrary mountpoint
types, without any additional configuration needed.
- There's a new, required option in the manifest: `libos.entrypoint` - it
specifies the URI to the entry binary in the first process. There's no
need anymore to name the manifest and the first binary identically.
- On SGX, the main binary is not measured in MRENCLAVE anymore - only
PAL, LibOS and the manifest are measured. This is enough to bind
MRENCLAVE to a specific entrypoint user executable if wanted - it
just has to be mounted as a trusted file.
- All Graphene SGX enclaves have now exactly the same MRENCLAVE. This is
a hash of a "Graphene stub", which can "fork" into one of two states
in runtime: initial process or child. The initial process creates a
new "Graphene namespace" with a clean state, it can also be attested
remotely (contrary to child processes). The initial process can spawn
children processes by spawning a Graphene stub and directing it to
start in the child mode. It then attests it locally, and if
successful, establishes an encrypted pipe, "connects" to its own
namespace and treats as trusted (including sending protected files
key).
- Now, there's only one, central manifest describing the initial state
of a Graphene instance which can be spawned from it (previously, each
process required a separate manifest which could have different
configuration - which wasn't actually supported and didn't make sense
design-wise). One downside of central manifests is that all processes
require the same enclave configuration (e.g. size), but that was
already the case so far because of broken checkpointing code. Also,
this is only a temporary problem, which will cease to exist after the
introduction of EDMM.
- `sgx.static_address` was renamed to `sgx.nonpie_binary` and now has to
be inserted manually by users (`sgx_sign` tools doesn't know about the
binaries run inside, which can be even provided or generated in
runtime by the user's workload).
- Caveat: the memory gap for non-PIE executables was removed because it
requires adding a new option to the manifest to be cleanly
implemented. This is left for some future loader rework PR.
This is a major refactor of the way manifests are loaded and handled,
which will be followed by a complete rework of the loader code (which
will include e.g. centralized config).
Changes/fixes:
- Huge part of manifest handling was refactored and untangled.
- Starting without a manifest is now disallowed. This was actually
accidentally broken for some time and no one complained. It also makes
little sense in practice and in Graphene's overall design, e.g. it
conflicts with protected argv.
- Now we only allow starting by giving the executable, not manifest (the
magic resolution logic was removed).
- Now manifests are sent over pipes between parent and children, instead
of children finding and loading them on their own. This is a
preparation for the upcoming centralized manifests change.
- Previously manifests were parsed 2 times on Linux and 3 times on
Linux-SGX (by untrusted PAL, trusted PAL and LibOS). This is now
fixed.
- The common `pal_main()` now requires that the backend-specific PAL
loader loads the manifest before calling it. SGX code already has to
do it (for proper initialization), so let's unify this interface for
all PALs.
- Fix for a PAL crash when manifest size was divisible by page size
(sic!). NULL termination was missing, but most of the time the padding
to page size saved Graphene from crashing.
The manifest syntax stays exactly the same, including 0 and 1
integers to denote boolean values (this is done for ease of porting
and can be fixed in future commits). The only visible change is
surrounding strings in the manifest with quotes (requirement of
TOML). All manifests and Makefiles of our tests and example apps are
ported to the new TOML syntax. Documentation is updated.
Supporting these options complicates the design of Graphene and loading
logic significantly, providing little useful functionality:
- loader.exec:
- the main user of it were our tests
- worked only for the first process spawned inside Graphene, as it
was a unidirectional manifest->binary mapping, so the child
process didn't know about the corresponding manifest.
- sgx.sigfile:
- probably all existing usages of it were completely redundant
- was resolved relatively to CWD instead of the executable location,
which made it mostly useless
From now on, the correct location of the files is:
- either place the manifest and sigfile next to the binary, with a
matching name, or
- create a symlink to the binary in the folder where manifests are
stored and launch it through this symlink
Previously, we introduced `sgx.zero_heap_on_demand` in Linux-SGX as a
knob to trade off runtime degradation on memory allocations for faster
enclave start-up times. This was an incorrect fix because Linux-SGX's
`_DkVirtualMemoryAlloc()` always zeroess the requested memory region,
so there was a double-zero of the heap at runtime. Note that LibOS
layer silently assumes that `_DkVirtualMemoryAlloc()` zeroes out the
memory, and many applications rely on this (Apache, Blender in my
experiments). Thus, this commit keeps the zero-out in
`_DkVirtualMemoryAlloc()` and removes zero-outs on enclave init and in
`get_enclave_pages()`. This renders `sgx.zero_heap_on_demand`
useless, so this manifest option is also removed. Also note that this
commit doesn't introduce any performance degradation (in fact, now
Graphene behaves as if `sgx.zero_heap_on_demand = 1` always).
The documentation currently specifies SGX_SIGNER_KEY as the parameter to
enable Graphene to find your keys.
Some examples don't use this environment parameter, this commit fixes
that.
On Ubuntu 'which' finds 'cp' in '/bin/cp' and on Fedora in
'/usr/bin/cp'. Rather than hard-coding the path '/bin', use
$(EXECDIR) and derive its value from the dirname of the path of the
executable, i.e., either '/bin' or '/usr/bin'.
Extend Makefile.configs and define several variables for make to use
derived from 'gcc -dumpmachine'. In particular:
- ARCH as the architecture, e.g., x86_64
- ARCH_LONG as the long version of the architecture, e.g., x86_64-linux-gnu
- ARCH_LIBDIR as the directory where libraries are located,
e.g., /lib/x86_64-linux-gnu
In Makefiles and manifest templates, replace the hard-coded
x86_64-linux_gnu and /lib/x86_64-linux-gnu through these variables.
Extend the already existing sed scripts to replace the necessary
variables.