In LP: 2009797, an exception of this form happens:
AttributeError: 'FilesystemController' object has no attribute '_start_task'
The installer client, u-d-i, is asking for storage information ASAP
after the socket starts listening, and in this case that happened before
all controllers were started. The sync primitive the probe is waiting
on wasn't created yet.
With one known exception, /meta/status, we really shouldn't be
responding to random API calls, and the startup sequence of the
controllers should be relatively quick (sub 1 second to be sure).
Just delay them, except for the special one.
SingleInstanceTask has distinct steps for creation of the object, and
starting the task. If a different coroutine is waiting on the
SingleInstanceTask, it isn't safe to directly call
SingleInstanceTask.wait() as the task may or may not have been created
yet.
Existing code usage of SingleInstanceTask is in 4 categories, with
reguards to SingleInstanceTask.wait():
1) using SingleInstanceTask without using SingleInstanceTask.wait().
This is unchanged.
2) using SingleInstanceTask.wait without a check on task is not None.
This may be safe now, but is fragile in the face of innocent-looking
refactors around the SingleInstanceTask.
3) using SingleInstanceTask.wait after confirming that the task is not
None. This is fine but a leaky abstraction.
4) directly waiting on the SingleInstanceTask.task. Another leaky
abstraction, but it's solving a cancellation problem. Leaving this
alone.
By enhancing SingleInstanceTask.wait(), cases 2 and 3 are improved. The
code not checking the task today is made safer, and the code checking
the task today can be simplified.
We aren't close to hitting this timeout in unit tests that I can see,
but some of the asyncio tests I'm writing for now risk a hang and I
don't want to get CI stuck.
In LP: 2009797, an exception of this form happens:
AttributeError: 'FilesystemController' object has no attribute '_start_task'
The installer client, u-d-i, is asking for storage information ASAP
after the socket starts listening, and in this case that happened before
all controllers were started. The sync primitive the probe is waiting
on wasn't created yet.
With one known exception, /meta/status, we really shouldn't be
responding to random API calls, and the startup sequence of the
controllers should be relatively quick (sub 1 second to be sure).
Just delay them, except for the special one.
SingleInstanceTask has distinct steps for creation of the object, and
starting the task. If a different coroutine is waiting on the
SingleInstanceTask, it isn't safe to directly call
SingleInstanceTask.wait() as the task may or may not have been created
yet.
Existing code usage of SingleInstanceTask is in 4 categories, with
reguards to SingleInstanceTask.wait():
1) using SingleInstanceTask without using SingleInstanceTask.wait().
This is unchanged.
2) using SingleInstanceTask.wait without a check on task is not None.
This may be safe now, but is fragile in the face of innocent-looking
refactors around the SingleInstanceTask.
3) using SingleInstanceTask.wait after confirming that the task is not
None. This is fine but a leaky abstraction.
4) directly waiting on the SingleInstanceTask.task. Another leaky
abstraction, but it's solving a cancellation problem. Leaving this
alone.
By enhancing SingleInstanceTask.wait(), cases 2 and 3 are improved. The
code not checking the task today is made safer, and the code checking
the task today can be simplified.
We aren't close to hitting this timeout in unit tests that I can see,
but some of the asyncio tests I'm writing for now risk a hang and I
don't want to get CI stuck.
A reasonably common type of bug report is hitting the block probing
timeout. That timeout is at 15 seconds today. Extend that to 90
seconds, and log the time it takes.
As a performance exercise, it would be good to know where that time is
being spent and why it takes 10x longer or more for some people.