Autoinstall related failures are more likely than not going to be
user caused, so we shouldn't immediately generate a crash report
for these types of failures. This should hopefully allow the user
to debug their autoinstall data much faster and reduce the number
of autoinstall-related bugs reported.
When setting up the logging in a strictly confined snap, use the 'root' group,
rather than 'adm'. This will not interfere with the sandbox's policy but also
does not result in providing wider access to the logs.
Signed-off-by: Maciej Borzecki <maciej.borzecki@canonical.com>
We write out the autoinstall data to make the install repeatable
but this should also include a reference to the autoinstall
documentation to increase usability.
In Python < 3.11, when passing a coroutine to asyncio.wait, it would
automatically be scheduled as a task. This isn't the case anymore with
Python 3.11. Now passing coroutines to asyncio.wait fails with:
TypeError: Passing coroutines is forbidden, use tasks explicitly.
Let's ensure we schedule the coroutines as tasks before passing them on
to asyncio.wait.
Signed-off-by: Olivier Gayot <olivier.gayot@canonical.com>
This commit re-adds some of the shared mock fields for testing
and removes a bad import from test_snaplist. These are changes
that shouldn't have been part of the previously reverted patch:
0a70a969d4
This reverts commit 39f1ea9cb6. The fix proposed
in this patch caused more issues than it fixed. We will have to revisit this in
a more nuanced way in the future. In the meantime users can make use of env
directly to strip/modify the subcommand environment.
While these changes are not supposed to take nearly this long,
per LP: #2034715 we know that they are, and that some systems will
correctly perform the finish_install() step if just given more time.
We recently made sure that after doing a snap refresh, the rich mode
(i.e., either rich or basic) is preserved. This was implemented by
storing the rich mode in a state file. When the client starts, it loads
the rich mode from said state file if it exists.
Unfortunately, on s390x, it causes installs to default to basic mode.
This happens because on this architecture, a subiquity install consists
of:
* a first client (over serial) showing the SSH password
* a second client (logging over SSH) actually going through the
installation UI.
Since the first client uses a serial connection, the state file is
created with rich-mode set to basic. Upon connecting using SSH, the
state file is read and the rich-mode is set to basic as well.
Fixed by storing the rich-mode in two separate files, one for clients
over serial and one for other clients.
LP: #2036096
Signed-off-by: Olivier Gayot <olivier.gayot@canonical.com>
Adding this import means a dependency on probert, which also means
anybody importing subiquity.common.types also has that requirement.
The make-kbd-info script imports types, and that steps was causing
snapcraft build failures due to not finding probert.
When a network interface is disconnected from the system (e.g.,
physically removed if it's a USB adapter), probert asynchronously calls
the del_link() method.
Upon receiving this notification, Subiquity server wants to send an
update to the Subiquity clients. The update contains information about
the interface that disappeared - which is obtained through a call to
netdev_info.
Unfortunately, for Wi-Fi and Ethernet interfaces, netdev_info
dereferences the NetworkDev.info variable. Interfaces that no longer
exist on the system (and also interfaces that do not yet exist), have
their "info" variable set to None - so an exception is raised when
dereferencing it.
Wi-Fi interface:
File "subiquitycore/models/network.py", line 227, in netdev_info
scan_state=self.info.wlan['scan_state'],
AttributeError: 'NoneType' object has no attribute 'wlan'
Ethernet interface:
File "subiquitycore/models/network.py", line 201, in netdev_info
is_connected = bool(self.info.is_connected)
AttributeError: 'NoneType' object has no attribute 'is_connected'
Fixed by making sure netdev_info does not raise if the dev.info variable
is None. This is a valid use-case.
Signed-off-by: Olivier Gayot <olivier.gayot@canonical.com>
Currently, the partition form stores the size as a human readable value.
(e.g., 123456K, 1.1G, 1.876G, 100G). When we exit the size field (i.e., upon
losing focus), we convert the value to a number of bytes and then align
it up to the nearest MiB (or whatever the alignment requirement is).
Unfortunately, after computing the aligned value, we turn it back into a
human-readable string and store it as is. It is not okay because the
conversion does not ensure that the alignment requirement is still
honored.
For instance, if the user types in 1.1G, we do the following:
* convert it to a number of bytes -> 1181116006.4 (surprise, it is not
even an integer).
* round it up to the nearest MiB -> 1181745152 (this is the correct
value)
* transform it into a human readable string and store it as is -> 1.1G
- which actually corresponds to the original value.
This leads to an exception later when creating the partition:
File "subiquity/models/filesystem.py", line 1841, in add_partition
raise Exception(
Exception: ('size %s or offset %s not aligned to %s', 1181116006, 1048576, 1048576)
Fixed by storing the actual size as a number of bytes - alongside the
human readable size.
Signed-off-by: Olivier Gayot <olivier.gayot@canonical.com>
Move some of this logic to a common util, so it can be used in
unittests. The curtin version is close but expects to work on a storage
config, which is not a flat list.
LD_LIBRARY_PATH is set earlier than some of the other environment
variables like PATH or PYTHONPATH, so trying to save a clean version in
snapcraft is not viable. Remove it here.
In LP: 2009797, an exception of this form happens:
AttributeError: 'FilesystemController' object has no attribute '_start_task'
The installer client, u-d-i, is asking for storage information ASAP
after the socket starts listening, and in this case that happened before
all controllers were started. The sync primitive the probe is waiting
on wasn't created yet.
With one known exception, /meta/status, we really shouldn't be
responding to random API calls, and the startup sequence of the
controllers should be relatively quick (sub 1 second to be sure).
Just delay them, except for the special one.
SingleInstanceTask has distinct steps for creation of the object, and
starting the task. If a different coroutine is waiting on the
SingleInstanceTask, it isn't safe to directly call
SingleInstanceTask.wait() as the task may or may not have been created
yet.
Existing code usage of SingleInstanceTask is in 4 categories, with
reguards to SingleInstanceTask.wait():
1) using SingleInstanceTask without using SingleInstanceTask.wait().
This is unchanged.
2) using SingleInstanceTask.wait without a check on task is not None.
This may be safe now, but is fragile in the face of innocent-looking
refactors around the SingleInstanceTask.
3) using SingleInstanceTask.wait after confirming that the task is not
None. This is fine but a leaky abstraction.
4) directly waiting on the SingleInstanceTask.task. Another leaky
abstraction, but it's solving a cancellation problem. Leaving this
alone.
By enhancing SingleInstanceTask.wait(), cases 2 and 3 are improved. The
code not checking the task today is made safer, and the code checking
the task today can be simplified.
Network manager can create routes at metric aka priority above 20000.
These can stick around if they are not the best choice, or they may
disappear quickly.
Do not consider one of these routes as a valid default route for
has_network purposes.
The existing event based method of watching for has_network has a flaw.
The incoming route_change events from probert do not distinguish routes
on the same interface but a different metric, so if 2 routes on one
interface appear, we only get one event. Then if one of those routes is
removed, we will inappropriately remove this route from the
default_routes list.
Aside from the code watching the event stream, the set of default routes
is an elaborate boolean value.
Simplify the code by passing around a boolean, and when we get a
route_change event, use that to go looking again at the list of default
routes.
LP: #2004659