[iot-wg] Reviews of draft-richardson-t2trg-idevid-considerations (part 1)

2 Nov 2022

      I wanted to appologize to Sini Ruohomaa for letting eir review languish so
long in my inbox.
I'll also note that I've presented about this document a few times in the
past month under the title: 'Social and Technical metrics for Trust Anchor resilence'
  1) IoT Security Foundation meeting,  https://iotsfconference.com/agenda/. Oct.5
     Youtube coming.
  2) TPM.DEV, Oct. 13,
     https://www.linkedin.com/events/tpmdev2022conference6977630882559864832/comm...
     video links coming.
  3) RIPE-85 IoT WG. https://ripe85.ripe.net/archives/video/950/
Slides at:
https://www.sandelman.ca/SSW/talk/2022-iotsf-anchor-reputations/index.shtml

It's also been presented with different slides a few other times during pandemic.
One advantage of doing this in person is that I get to see nonverbal nods of agreement,
or looks of confusion from the audience, and I think that the concepts have
been generally very well received.  But, I'm sure that the actual definitions
could use some work, and perhaps that will happen when the RG adopts the document.

I've also had about ten private discussions with trust anchor
creators/technology creators such as CPU and TEE manufacturers.
Getting them on the record has been.... challenging.
Taking the document out to industry was the homework I was assigned by
SECDISPATCH back in 2022.

Sini wrote:
    > I would like to pass my warm greetings to the authors for the
    > effort of trying to make this more communicable. I would like
    > to reference this document when it is ready. It is written in a very
    > approachable way so I see value in it also as an educational document,
    > not just an evaluation framework.

Thank you for the encouraging words.

    > "An increasing number of protocols derive a significant part of their
    > security by using trust anchors [RFC4949] that are installed by
    > manufacturers. Disclosure of the list of trust anchors does not usually
    > cause a problem, but changing them in any way does. This includes
    > adding, replacing or deleting anchors. [RFC6024] deals with how trust
    > anchor stores are managed, while this document deals with how the
    > associated PKI which is anchor is managed."

    > The last sentence is a bit hard to parse.

Changed to:

   + The document {{RFC6024}} deals with how trust anchor stores are managed
     in the device which uses them.
   + This document deals with how the PKI associated with such a trust anchor is managed.

I wonder if that's actually any better.

    > "Device identity keys are used when performing enrollment requests (in
    > [RFC8995], and in some uses of [I-D.ietf-emu-eap-noob]. The device
    > identity certificate is also used to sign Evidence by an Attesting
    > Environment (see [I-D.ietf-rats-architecture])."

    > One closing parenthese ")" is missing.

fixed.

    > 3.6 "what else" question

    > The identities trust anchor for validating identities of non-web-PKI
    > devices in case they need to ever talk to each other, or the identity
    > trust anchor of a vendor specific service that the device must connect
    > to in order to do e.g. enrolment of other identities, setting up
    > attestation or zero-touch bootstrapping come to mind (this allows the
    > device too to know who it is talking to, in addition to having the
    > device authenticate towards such an enrolment service).

    > I would cover these with a 'other identity validation trust anchors' in
    > addition to the public web PKI trust anchors, as the category is
    > probably too wide to cover satisfactorily an individual subcategory at
    > a time.

I have added sections:
   Private/Cloud PKI anchors
   Onboarding/Enrollment anchors
   Onboarded Network-Local anchors

    > 4.1: minor

    > "[ieee802-1AR] defines a category of certificates that are installed by
    > the manufacturer, which contain at the least, a device unique serial
    > number."

    > Had trouble parsing the sentence, is the comma right?

Changed to:
        {{ieee802-1AR}} defines a category of certificates that are installed
        by the manufacturer which contain a device unique serial number.

    > "This may be a software setting, or could be as dramatic as blowing a
    > fuse."

    > I really appreciate the author's hearty writing style here. :D

:-)
You should imagine a Star Wars/Genosis-Droid-Factory and/or James Bond/Dr.No
with a huge laser gently browsing across defenseless router parts on a conveyor belt,
permanently stencilling them with an identity.
(With so many droids in the Star Wars universe, surely "R2D2" isn't a serial
number, but rather a model number? )

    > 4.1.2.2

    > "Generating the key off-device has the advantage that the randomness of
    > the private key can be better analyzed. "

    > A second benefit that I expect has been more commercially interesting
    > is that (assuming serial numbers or other relevant identities can be
    > pre-reserved) the certificates can be pre-issued, which eliminates both
    > CA latency and CA dependency from the manufacturing process itself. I
    > still would not recommend off-device generation though.

I have emphasized this point a bit at:
   https://github.com/mcr/idevid-security-considerations/pull/4/commits/4ba7a46...

    > Due to the leak
    > potential and wider exposure to insider attack it has a creepy ring to
    > it that may undermine  the trust placed in the PKI. Even if we 'never
    > store' the key (as is suggested later in this section) we will leave
    > computational traces of the key around.

I agree.  Yet, this reduction in cost might be the difference between having
a real IDevID certificate loaded into your microwave, and having some BS attempt
to do fake security.
A major point of this document is for manufacturers to feel that it's okay to
tell us what they did.  The judgement can come later through market opinions.

    > 4.1.2.3

    > Agree with the conclusion that this method shares the downsides of both
    > above.

Do you have any suggestions as to a better name?

The CPU/MCU makers are very keen on this method, and it accomodates the whole
"PUF" mechanism too.  I am personally not convinced PUF is truly random, but I
don't think we'll know until someone clones one.

    > 5. PKIs

    > "A PKI which discloses one or more private certification authority keys
    > is no longer secure."

    > Technically, subCAs exist for the sake that they CAN be revoked by a
    > higher level CA, to recover from compromise of an issuing level CA
    > which is typically more exposed than the higher levels.

    > However,
    > everything issued from that subCA is indeed due for hardware recall so
    > the event is major. But the most major event is leaking the root CA, as
    > there is no way to recover from that within the same root chain.

Yes, so subordinate CAs limit the damage of key lost to a malicious actor.

    > To get really into detail though, a well managed CA is able to keep
    > track of public keys it has intentionally signed into certificate, and
    > depending on the use case even a compromised subCA key could
    > theoretically be worked around in the field through administrative
    > means that basically let go of the cryptographic protection of the PKI
    > chain in favour of administrative tracking.

This basically turns into some kind of administrative CRL if you take it far enough.

    > I think that part would be
    > hackish and beyond the scope of this text, however. Just can't help
    > myself going through the what-ifs; these PKIs have to be seriously
    > robust.

Well, again, they need to be robust enough for the purpose that expect.
At one IoTSF conference, a pen-tester describes finding a building-control
management system for a major airport on the Internet with **default** passwords.
No amount of good HTTPS certificates on this server side of this system helps
you out here. A system of good trust anchors and client-side certificates
would be better, assuming that the system needed to be reachable.

OTH, an Internet connected alarm clock (with a cartoon character) for my
kids' bedroom might not need such a robust PKI.  Just one strong enough to
onboard it to my network

    > Just imagine a full hardware recall of a car manufacturer for a
    > "oops need a quick bootstrapping identity PKI fix". Or of anything
    > resembling a home wifi access point box that end users don't even touch
    > if it's still working.

Yes, agreed.
Imagine a fleet of ambulances that had to be grounded because the radio had
certificates whose CA was compromised.

    > 5.1

    > "It is quite common to have a three-level PKI, where the root of the CA
    > is stored in a Hardware Security Module, while the level one
    > subordinate CA is available in an online form."

    > This formulation implies that subCAs would not be in hardware security
    > modules or that they cannot offer CAs available in online form. I think
    > we should definitely underline the importance of keeping subCAs in HSMs
    > as well.

So, I wrote that text trying to emphasize that the root CA had to be in an
HSM that was offline.  While the subordinate CA had to be online to be able
to issue EE certificates.  I wasn't trying to say that it couldn't be an HSM!
Reading it also reveals that I've messed up the level text as well.

    > I would separate the two dimensions explicitly and say "where the root
    > of the CA is stored in a Hardware Security Module in a way that it
    > cannot be continuously accessed ('offline'), while the subordinate CA
    > can sign certificates at any time ('online')." Or potentially even
    > another formulation that mentions completely separately that "CA keys
    > are stored in Hardware Security Modules so that the root ... and the
    > subordinate ...".

I've adapted your text.

    > The section below talks about this in more length so this could also be
    > migrated there.

    > 6.1

    > "first-stage initialization" focusing on the raw number of people
    > 'involved' feels maybe a bit arbitrary.

Okay.

    > - it is also important what level of access these people have to the
    > device in practice, or are they performing primarily a monitoring task

    > - considering a factory shopfloor, it may be misleading to think that
    > only 3 people are involved because 3 people actually are *supposed* to
    > be touching the board, if in fact 300 other people have access to the
    > actual space and could pop by to intervene with the process if it's not
    > happening in a separate locked room. (Note that the janitors could also
    > be bribed and planted/instructed even though no one would count them as
    > 'involved' by default.)

Your thoughts are very spot on.
How many people are involved that could be bribed?
If it's not in a locked room, then I would expect to write down 300 people in
the factory.
How would you rewrite this section to make this clearer?

    > I might redefine this as 'exposure' of the device while in the process,
    > where fully automatic closed area might be 'low' and a large shopfloor
    > where 20+ people have physical access to actually go physically touch
    > the device or the equipment processing it as 'high', for example.

I've changed the metric to be called "first-stage-exposure"

    > first-second-stage-gap: This factor is not formulated as a question.
    > (Some later factors are not capitalized and punctuated consistently.)

Changed to:

first-second-stage-gap:
: how far and long does a board travel between being initialized with a first-stage bootloader to where it is locked down so that changes to the bootloader can no longer be made.
For many situations, there is no distance at all as they occur in the same factory, but for other situations boards are manufactured and tested in one location, but are initialized elsewhere.

At this point, I'm going to start a new email!

--
Michael Richardson <mcr+IETF@sandelman.ca>, Sandelman Software Works
 -= IPv6 IoT consulting =-