Reviews of draft-richardson-t2trg-idevid-considerations (part 1)
I wanted to appologize to Sini Ruohomaa for letting eir review languish so long in my inbox. I'll also note that I've presented about this document a few times in the past month under the title: 'Social and Technical metrics for Trust Anchor resilence' 1) IoT Security Foundation meeting, https://iotsfconference.com/agenda/. Oct.5 Youtube coming. 2) TPM.DEV, Oct. 13, https://www.linkedin.com/events/tpmdev2022conference6977630882559864832/comm... video links coming. 3) RIPE-85 IoT WG. https://ripe85.ripe.net/archives/video/950/ Slides at: https://www.sandelman.ca/SSW/talk/2022-iotsf-anchor-reputations/index.shtml It's also been presented with different slides a few other times during pandemic. One advantage of doing this in person is that I get to see nonverbal nods of agreement, or looks of confusion from the audience, and I think that the concepts have been generally very well received. But, I'm sure that the actual definitions could use some work, and perhaps that will happen when the RG adopts the document. I've also had about ten private discussions with trust anchor creators/technology creators such as CPU and TEE manufacturers. Getting them on the record has been.... challenging. Taking the document out to industry was the homework I was assigned by SECDISPATCH back in 2022. Sini wrote: > I would like to pass my warm greetings to the authors for the > effort of trying to make this more communicable. I would like > to reference this document when it is ready. It is written in a very > approachable way so I see value in it also as an educational document, > not just an evaluation framework. Thank you for the encouraging words. > "An increasing number of protocols derive a significant part of their > security by using trust anchors [RFC4949] that are installed by > manufacturers. Disclosure of the list of trust anchors does not usually > cause a problem, but changing them in any way does. This includes > adding, replacing or deleting anchors. [RFC6024] deals with how trust > anchor stores are managed, while this document deals with how the > associated PKI which is anchor is managed." > The last sentence is a bit hard to parse. Changed to: + The document {{RFC6024}} deals with how trust anchor stores are managed in the device which uses them. + This document deals with how the PKI associated with such a trust anchor is managed. I wonder if that's actually any better. > "Device identity keys are used when performing enrollment requests (in > [RFC8995], and in some uses of [I-D.ietf-emu-eap-noob]. The device > identity certificate is also used to sign Evidence by an Attesting > Environment (see [I-D.ietf-rats-architecture])." > One closing parenthese ")" is missing. fixed. > 3.6 "what else" question > The identities trust anchor for validating identities of non-web-PKI > devices in case they need to ever talk to each other, or the identity > trust anchor of a vendor specific service that the device must connect > to in order to do e.g. enrolment of other identities, setting up > attestation or zero-touch bootstrapping come to mind (this allows the > device too to know who it is talking to, in addition to having the > device authenticate towards such an enrolment service). > I would cover these with a 'other identity validation trust anchors' in > addition to the public web PKI trust anchors, as the category is > probably too wide to cover satisfactorily an individual subcategory at > a time. I have added sections: Private/Cloud PKI anchors Onboarding/Enrollment anchors Onboarded Network-Local anchors > 4.1: minor > "[ieee802-1AR] defines a category of certificates that are installed by > the manufacturer, which contain at the least, a device unique serial > number." > Had trouble parsing the sentence, is the comma right? Changed to: {{ieee802-1AR}} defines a category of certificates that are installed by the manufacturer which contain a device unique serial number. > "This may be a software setting, or could be as dramatic as blowing a > fuse." > I really appreciate the author's hearty writing style here. :D :-) You should imagine a Star Wars/Genosis-Droid-Factory and/or James Bond/Dr.No with a huge laser gently browsing across defenseless router parts on a conveyor belt, permanently stencilling them with an identity. (With so many droids in the Star Wars universe, surely "R2D2" isn't a serial number, but rather a model number? ) > 4.1.2.2 > "Generating the key off-device has the advantage that the randomness of > the private key can be better analyzed. " > A second benefit that I expect has been more commercially interesting > is that (assuming serial numbers or other relevant identities can be > pre-reserved) the certificates can be pre-issued, which eliminates both > CA latency and CA dependency from the manufacturing process itself. I > still would not recommend off-device generation though. I have emphasized this point a bit at: https://github.com/mcr/idevid-security-considerations/pull/4/commits/4ba7a46... > Due to the leak > potential and wider exposure to insider attack it has a creepy ring to > it that may undermine the trust placed in the PKI. Even if we 'never > store' the key (as is suggested later in this section) we will leave > computational traces of the key around. I agree. Yet, this reduction in cost might be the difference between having a real IDevID certificate loaded into your microwave, and having some BS attempt to do fake security. A major point of this document is for manufacturers to feel that it's okay to tell us what they did. The judgement can come later through market opinions. > 4.1.2.3 > Agree with the conclusion that this method shares the downsides of both > above. Do you have any suggestions as to a better name? The CPU/MCU makers are very keen on this method, and it accomodates the whole "PUF" mechanism too. I am personally not convinced PUF is truly random, but I don't think we'll know until someone clones one. > 5. PKIs > "A PKI which discloses one or more private certification authority keys > is no longer secure." > Technically, subCAs exist for the sake that they CAN be revoked by a > higher level CA, to recover from compromise of an issuing level CA > which is typically more exposed than the higher levels. > However, > everything issued from that subCA is indeed due for hardware recall so > the event is major. But the most major event is leaking the root CA, as > there is no way to recover from that within the same root chain. Yes, so subordinate CAs limit the damage of key lost to a malicious actor. > To get really into detail though, a well managed CA is able to keep > track of public keys it has intentionally signed into certificate, and > depending on the use case even a compromised subCA key could > theoretically be worked around in the field through administrative > means that basically let go of the cryptographic protection of the PKI > chain in favour of administrative tracking. This basically turns into some kind of administrative CRL if you take it far enough. > I think that part would be > hackish and beyond the scope of this text, however. Just can't help > myself going through the what-ifs; these PKIs have to be seriously > robust. Well, again, they need to be robust enough for the purpose that expect. At one IoTSF conference, a pen-tester describes finding a building-control management system for a major airport on the Internet with **default** passwords. No amount of good HTTPS certificates on this server side of this system helps you out here. A system of good trust anchors and client-side certificates would be better, assuming that the system needed to be reachable. OTH, an Internet connected alarm clock (with a cartoon character) for my kids' bedroom might not need such a robust PKI. Just one strong enough to onboard it to my network > Just imagine a full hardware recall of a car manufacturer for a > "oops need a quick bootstrapping identity PKI fix". Or of anything > resembling a home wifi access point box that end users don't even touch > if it's still working. Yes, agreed. Imagine a fleet of ambulances that had to be grounded because the radio had certificates whose CA was compromised. > 5.1 > "It is quite common to have a three-level PKI, where the root of the CA > is stored in a Hardware Security Module, while the level one > subordinate CA is available in an online form." > This formulation implies that subCAs would not be in hardware security > modules or that they cannot offer CAs available in online form. I think > we should definitely underline the importance of keeping subCAs in HSMs > as well. So, I wrote that text trying to emphasize that the root CA had to be in an HSM that was offline. While the subordinate CA had to be online to be able to issue EE certificates. I wasn't trying to say that it couldn't be an HSM! Reading it also reveals that I've messed up the level text as well. > I would separate the two dimensions explicitly and say "where the root > of the CA is stored in a Hardware Security Module in a way that it > cannot be continuously accessed ('offline'), while the subordinate CA > can sign certificates at any time ('online')." Or potentially even > another formulation that mentions completely separately that "CA keys > are stored in Hardware Security Modules so that the root ... and the > subordinate ...". I've adapted your text. > The section below talks about this in more length so this could also be > migrated there. > 6.1 > "first-stage initialization" focusing on the raw number of people > 'involved' feels maybe a bit arbitrary. Okay. > - it is also important what level of access these people have to the > device in practice, or are they performing primarily a monitoring task > - considering a factory shopfloor, it may be misleading to think that > only 3 people are involved because 3 people actually are *supposed* to > be touching the board, if in fact 300 other people have access to the > actual space and could pop by to intervene with the process if it's not > happening in a separate locked room. (Note that the janitors could also > be bribed and planted/instructed even though no one would count them as > 'involved' by default.) Your thoughts are very spot on. How many people are involved that could be bribed? If it's not in a locked room, then I would expect to write down 300 people in the factory. How would you rewrite this section to make this clearer? > I might redefine this as 'exposure' of the device while in the process, > where fully automatic closed area might be 'low' and a large shopfloor > where 20+ people have physical access to actually go physically touch > the device or the equipment processing it as 'high', for example. I've changed the metric to be called "first-stage-exposure" > first-second-stage-gap: This factor is not formulated as a question. > (Some later factors are not capitalized and punctuated consistently.) Changed to: first-second-stage-gap: : how far and long does a board travel between being initialized with a first-stage bootloader to where it is locked down so that changes to the bootloader can no longer be made. For many situations, there is no distance at all as they occur in the same factory, but for other situations boards are manufactured and tested in one location, but are initialized elsewhere. At this point, I'm going to start a new email! -- Michael Richardson <mcr+IETF@sandelman.ca>, Sandelman Software Works -= IPv6 IoT consulting =-
> 6.2 > "identity-pki-level: how deep are the IDevID certificates that are > issued?" > Would just clarify formulation a bit: how many levels of CAs exist > above the certificates issued. (The 'depth' concept seems ambiguous and > may cause confusion.) I've added a reference back to section 5.1. > identity-anchor-storage: 'Recover' the private key is a bit ambiguous - > different quorums may be needed for a) making it available to sign a > new CA vs. b) making it possible to actually copy the root key around > to new devices/places. I would maybe use the term 'to access' here to > avoid that debate altogether, since signature access is enough to > compromise. I've split this up. Since your review, I have added a section _Preservation of CA and Trust Anchor private keys_ which I'll now reference. I've now changed this to read: identity-anchor-storage: : how is the root CA key stored? An open description that might include whether an HSM is used, or not, or even the model of it. identity-shared-split-extra: : referring to {{splittingnumbers}}, where a private key is split up into n-components, of which k are required to recover the key, this number is n-k. This is the number of spare shares. Publishing this provides a measure of how much redundancy is present while not actually revealing either k or n. identity-shared-split-continents: : the number of continents on which the private key can be recovered without travel by any of the secret share holders I've thought a lot about whether one should reveal k, or n, and decided that it is safe to reveal n-k. This does not reveal n or k, or the relationship between them, but it does tell an external entity what about redundancy there is. } In this document, each of the people who hold a piece of the secret are } referred to as Key Executives. I would sure like comments on n-k, and on this term Key Executives. I re-read shamir79 (having found a new copy, since the URL I had went stale), and I did not find a clear term for the holder of the D_i. In rewriting things, I have created some next text in this patch: https://github.com/mcr/idevid-security-considerations/pull/4/commits/503cec4... about secret sharing. More review would be very welcome. > Potentially some indicators to add could include how wide the issuing > CA access is stretched, if that can be captured in a simple question. What do you mean "captured in a simple question"? > The easiest way to deploy a subCA is to give everyone a copy of the > private key, and then you can't place a whole lot of trust on the CA > itself. Handing off subCAs vs. having a centralized entity controlling > all issuing CAs does make a difference in possibility for > administrative oversight. Yes. Multiple subCAs in different places is, I think, a third way. } This might be a very good way to manage a code signing or update signing key. } Split it among development groups in three time zones (eight hours apart), } such that any of those development groups can issue an emergency security } patch. } (Another way would be to have three End-Entity certificates that can sign } code, and have each time zone sign their own code. That implies that there } is at least a level two PKI around the code signing process, and that any } bootloaders that need to verify the code being starting it are able to do PKI } operations) > 6.3 more in another email. -- Michael Richardson <mcr+IETF@sandelman.ca>, Sandelman Software Works -= IPv6 IoT consulting =-
> pki-level: similar consideration here, 'deep' is a bit ambiguous. Not > sure what the EE level to evaluate actually implies of the integrity of > the trust anchors, it seems more a category for e.g. integrity and > privacy of 'software signing PKI' or whatever we want to call this one- > to-many devices relationship vs. the many devices to 'one' integration > target in identities. It's supposed to be evaluated according to the PKI level section. > pki-algorithms: potentially include also the timespan they are > considered valid or in active use, cf. keylength.org - it makes a big > difference if you trust the lowest rung of security for the next > several decades or just a year or two. Yes, I agree with you that the algorithms in use have best-before dates, but since we are talking about anchor that is baked into the device, it will be baked in with a specific algorithm and strength. In some cases a software update can replace those anchors, in other cases, there are operational processes (RFC7030 EST cacerts request) that could replace/ update the anchors. But, let's say that this is the self-signed (EE) anchor that is used to talk to the cloud service (you commented on that was missing). We could indicate when this anchor is going to be replaced here, but how can we know for sure? > - potentially also could differentiate here between algorithms actually > used in the PKI vs. algorithms the device in general is capable of, > because if they are not used in the PKI (and that is more painful to > enumerate) then their appearance is mostly irrelevant or a sign of > compromise. The current formulation implies the device is able to > handle these algorithms but if they are not used in the PKI now or > tomorrow they are irrelevant. Yes, this is a good comment, and I'll adjust it. > (Generally algorithm filtering in semi-closed systems is best achieved > by just not signing certificates that are worse, not so much at Agreed, so I will change the above metric to be about what is supported, rather than actually in use. > pki-level-locked: This formulation is unclear. I think what it means is > whether X.509 path level constraints are applied in the root and/or > below. It is most relevant for the online subCAs if they are "handed > around" as opposed to being centrally controlled, i.e. can the subCA > quietly sign another subCA. Exactly. It's not so much that it's quietly, it's whether or not the device, when it has been operating with a level-3 PKI can deal with a level-4 PKI. > pki-breadth: maybe hard to answer and potentially largely irrelevant as > I can't see outside public web CAs anyone coming to wag a finger if a > vendor suddenly becomes super popular and issues certificates for an > exponentially larger number of end entities with a wider series of CAs > than initially deployed. These are private PKIs, so their fingers can stay in their coat pockets :-) What I care about is whether the database is going to die after 10 code signing keys have been issued. Will they lose track of stuff? I'm thinking about CRL and OCSP issues here. Just tell us a bit about your planning. Remember that for some situations, a single core laptop locked in a filing cabinet is *just fine* to sign the three software updates issued a year for your heat pump. The goal is not the "highest" security, but rather *appropriate* level of security. > All they really need to do is add storage > space. The load on an individual issuing CA is more interesting from a > few different points of view, than the capacity of the full and very > expandable PKI tree. Though of course there might be some actual > limitations in some management software implementations too. If you think it's useless, I can remove it. > "pki-lock-policy: can any EE certificate be used with this trust > anchor to sign? Or, is there some kind of policy OID or Subject > restriction? Are specific subordinate CAs needed that lead to the EE?" > It is unclear here what the first sentence means, 'to sign what' > basically. OID references are good. I would replace 'needed' with > 'required' or 'demanded' as they are often constraints of systems > rather than actual needs. Consider an EE cert used for software updates. Can any EE cert from the trusted CA sign software updates, or does it need to be blessed in some way? -- Michael Richardson <mcr+IETF@sandelman.ca>, Sandelman Software Works -= IPv6 IoT consulting =-
participants (1)
-
Michael Richardson