BuildAndDissect: dissect.txt

File dissect.txt, 23.0 KB (added by anonymous, 2 years ago)
Line 
1              =========================================
2             == Building new protocols, the scapy way ==
3             ==            Fred Raynal                ==
4             ==      fred(at)security-labs.org        ==
5              =========================================
6
7
8This article explains how to  build a new protocol within scapy. There
9are 2 main objectives:
10
11  - Dissecting  : this is  done when  a packet  is received  (from the
12    network or a file) and should be converted to scapy's internals.
13   
14  - Building :  When one wants to  send such a new  packet, some stuff
15    needs to be adjusted automatically in it.
16
17 
18
19===============================
20=  Not "packets" but "layers" =
21===============================
22
23Before digging into dissection itself,  let us look at how packets are
24organized.
25
26>>> p = IP()/TCP()/"AAAA"
27>>> p
28<IP  frag=0 proto=TCP |<TCP  |<Raw  load='AAAA' |>>>
29>>> p.summary()
30'IP / TCP 127.0.0.1:ftp-data > 127.0.0.1:www S / Raw'
31
32We are interested in 2 "inside" fields of the class Packet:
33  - p.underlayer
34  - p.payload
35
36And here  is the  main "trick".  You do not  care about  packets, only
37about layers, stacked one after the other.
38
39One can easily  access a layer by its name  : p[TCP] returns the TCP
40and followings layers. This is a shortcut for p.getlayer(TCP).
41
42  Tip :  there is  an optional argument  (nb) which returns  the nb^th
43    layer of required protocol
44
45Let's put everything together now, playing with the TCP layer:
46>>> tcp=p[TCP]
47>>> tcp.underlayer
48<IP  frag=0 proto=TCP |<TCP  |<Raw  load='AAAA' |>>>
49>>> tcp.payload
50<Raw  load='AAAA' |>
51
52As expected, tcp.underlayer points to the beginning of our IP packet,
53and tcp.payload to its payload.
54
55
56* Building a new layer
57======================
58
59VERY EASY  ! A layer  is mainly  a list of  fields. Let's look  at UDP
60definition:
61
62    class UDP(Packet):
63       name = "UDP"
64        fields_desc = [ ShortEnumField("sport", 53, UDP_SERVICES),
65                        ShortEnumField("dport", 53, UDP_SERVICES),
66                        ShortField("len", None),
67                        XShortField("chksum", None), ]
68
69And  you  are  done  !  There  are many  fields  already  defined  for
70convenience, look at the doc^W sources as Phil would say.
71
72So, defining a layer is simply gathering fields in a list. The goal is
73here to  provide the  efficient default values  for each field  so the
74user does not have to give them when he builds a packet.
75
76The main  mechanism  is based on  the Field structure.  Always keep in
77mind that a layer is just a little more than a list of fields, but not
78much more.
79
80So, to understanding how layers are working, one needs to look quickly
81at how the fields are handled.
82
83
84* Manipulating packets == manipulating its fields
85=================================================
86
87A field should be considered in different states:
88  - i(nternal) : this is the way scapy manipulates it.
89  - m(achine) : this is where the truth is, that is the layer as it is
90    on the network.
91  - h(uman) : how the packet is displayed to our human eyes.
92
93This explains  the mysterious  methods i2h(), i2m(),  m2i() and  so on
94available  in  each field:  they  are  conversion  from one  state  to
95another, adapted to a specific use.
96
97Other special functions:
98  - any2i() guess the input representation and returns the internal
99    one.
100  - i2repr() a nicer i2h()
101
102However, all these are "low level" functions. The functions adding or
103extracting a field to the current layer are:
104
105  - addfield(self, pkt,  s, val):  copy the network  representation of
106    field val (belonging to layer pkt) to the raw string packet s.
107
108      class StrFixedLenField(StrField):
109          def addfield(self, pkt, s, val):
110            return s+struct.pack("%is"%self.length,self.i2m(pkt, val))
111
112  - getfield(self, pkt, s): extract from the raw packet s the field
113    value belonging to layer pkt. It returns a list, the 1st element
114    is the raw packet string after having removed the extracted field,
115    the second one is the extracted field itself in internal
116    representation
117
118      class StrFixedLenField(StrField):
119          def getfield(self, pkt, s):
120              return s[self.length:], self.m2i(pkt,s[:self.length])
121
122       
123When defining your own layer, you usually just need to define some
124*2*() methods, and sometimes also the addfield() and getfield().
125
126
127* Example : variable length quantities
128======================================
129
130There is way to represent integers on a variable length quantity often
131used in  protocols, for instance  when dealing with  signal processing
132(e.g. MIDI).
133
134Each byte  of the number is  coded with the  MSB set to 1,  except the
135last byte. For instance, 0x123456 will be coded as 0xC8E856:
136
137def vlenq2str(l):
138    s = []
139    s.append( hex(l & 0x7F) )
140    l = l >> 7
141    while l>0:
142        s.append( hex(0x80 | (l & 0x7F) ) )
143        l = l >> 7
144    s.reverse()
145    return "".join(map( lambda(x) : chr(int(x, 16)) , s))
146
147def str2vlenq(s=""):
148    i = l = 0
149    while i<len(s) and ord(s[i]) & 0x80:
150        l = l << 7
151        l = l + (ord(s[i]) & 0x7F)
152        i = i + 1
153    if i == len(s):
154        warning("Broken vlenq: no ending byte")
155    l = l << 7
156    l = l + (ord(s[i]) & 0x7F)
157
158    return s[i+1:], l
159
160
161We will  define a field which  computes automatically the  length of a
162associated string, but used that encoding format.
163
164
165class VarLenQField(Field):
166    """ variable length quantities """
167
168    def __init__(self, name, default, fld):
169        Field.__init__(self, name, default)
170        self.fld = fld
171       
172    def i2m(self, pkt, x):
173        if x is None:
174            f = pkt.get_field(self.fld)
175            x = f.i2len(pkt, pkt.getfieldval(self.fld))
176            x = vlenq2str(x)
177        return str(x)
178
179    def m2i(self, pkt, x):
180        if s is None:
181            return None, 0
182        return str2vlenq(x)[1]
183
184    def addfield(self, pkt, s, val):
185        return s+self.i2m(pkt, val)
186
187    def getfield(self, pkt, s):
188        return str2vlenq(s)
189
190
191And now, define a layer using this kind of field:
192
193class FOO(Packet):
194    name = "FOO"
195    fields_desc = [ VarLenQField("len", None, "data"),
196                    StrLenField("data", "", "len") ]
197
198>>> f = FOO(data="A"*129)
199>>> f.show()
200###[ FOO ]###
201  len= 0
202  data= 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'
203
204Here,  len  is  not  yet  computed  and only  the  default  value  are
205displayed.  This  is  the   current  internal  representation  of  our
206layer. Let's force the computation now:
207
208>>> f.show2()
209###[ FOO ]###
210  len= 129
211  data= 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'
212
213The method show2() displays the  fields with their values as they will
214be sent to the network, but in a human readable way, so we see len=129
215Last but not least, let us look now at the machine representation:
216
217>>> str(f)
218'\x81\x01AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'
219
220The first 2 bytes are \x81\x01, which is 129 in this encoding.
221
222
223==============
224= Dissecting =
225==============
226
227Layers are  only list  of fields,  but what is  the glue  between each
228field, and after, between each  layer. These are the mysteries explain
229in this section.
230
231* The basic stuff
232=================
233
234The core function for dissection is Packet.dissect():
235
236    def dissect(self, s):
237        s = self.pre_dissect(s)
238        s = self.do_dissect(s)
239        s = self.post_dissect(s)           
240        payl,pad = self.extract_padding(s)
241        self.do_dissect_payload(payl)
242        if pad and conf.padding:
243            self.add_payload(Padding(pad))
244
245When called, s is a string containing what is going to be
246dissected. self points to the current layer.
247
248>>> p=IP("A"*20)/TCP("B"*32)
249WARNING: bad dataofs (4). Assuming dataofs=5
250>>> p
251<IP  version=4L ihl=1L tos=0x41 len=16705 id=16705 flags=DF frag=321L ttl=65 proto=65 chksum=0x4141 src=65.65.65.65 dst=65.65.65.65 |<TCP  sport=16962 dport=16962 seq=1111638594L ack=1111638594L dataofs=4L reserved=2L flags=SE window=16962 chksum=0x4242 urgptr=16962 options=[] |<Raw  load='BBBBBBBBBBBB' |>>>
252
253Packet.dissect() is called 3 times:
2541. to dissect the "A"*20 as an IPv4 header
2552. to dissect the "B"*32 as a TCP header
2563. and  since  there  are still  12  bytes  in  the packet,  they  are
257   dissected as "Raw" data (which is some kind of default layer type)
258
259
260For a given layer, everything is quite straightforward.
261
262- pre_dissect() is called to prepare the layer.
263
264- do_dissect() perform the real dissection of the layer.
265
266- post_dissection() is  called when some  updates are needed  on the
267  dissected inputs (e.g. deciphering, uncompressing, ... )
268
269- extract_padding() is an important  function which should be called
270  by every  layer containing  its own size, so that it can tell apart
271  in  the payload what is really related to this layer and what will
272  be considered as additional padding bytes.
273
274- do_dissect_payload()  is the  function in  charge of  dissecting the
275  payload  (if  any).  It   is  based  on  guess_payload_class()  (see
276  below). Once the type of the  payload is known, the payload is bound
277  to the current layer with this new type:
278
279      def do_dissect_payload(self, s):
280          cls = self.guess_payload_class(s)
281          p = cls(s, _internal=1, _underlayer=self)
282          self.add_payload(p)
283
284At the  end, all  the layers  in the packet  are dissected,  and glued
285together with their known types.
286
287
288* Dissecting fields
289===================
290
291The  method with  all the  magic  between a  layer and  its fields  is
292do_dissect(). If you have  understood the different representations of
293a layer, you  should understand that "dissecting" a  layer is building
294each of its fields from the machine to the internal representation.
295
296Guess what? That is exactly what do_dissect() does:
297
298    def do_dissect(self, s):
299        flist = self.fields_desc[:]
300        flist.reverse()
301        while s and flist:
302            f = flist.pop()
303            s,fval = f.getfield(self, s)
304            self.fields[f] = fval
305        return s
306
307So, it  takes the raw string packet,  and feed each field  with it, as
308long as there are data or fields remaining
309
310    >>> FOO("\xff\xff"+"B"*8)
311    <FOO  len=2097090 data='BBBBBBB' |>
312
313When writing FOO("\xff\xff"+"B"*8), it calls do_dissect(). The first
314field is VarLenQField.  Thus, it takes bytes as long as their MSB is
315set, thus until (and including) the first 'B'. This mapping is done
316thanks to VarLenQField.getfield() and can be cross-checked:
317
318    >>> vlenq2str(2097090)
319    '\xff\xffB'
320
321Then, the  next field is extracted  the same way,  until 2097090 bytes
322are put  in FOO.data (or less  if 2097090 bytes are  not available, as
323here).
324
325If  there are  some bytes  left after  the dissection  of  the current
326layer, it is mapped  in the same way to the what  the next is expected
327to be (Raw by default):
328
329    >>> FOO("\x05"+"B"*8)
330    <FOO  len=5 data='BBBBB' |<Raw  load='BBB' |>>
331
332Hence, we need now to understand how layers are bound together.
333
334
335* Binding layers
336================
337
338One of the cool features with  scapy when dissecting layers is that is
339try to guess for us what the next layer is. The official way to link 2
340layers is using bind_layers():
341
342For instance,  if you have a class  HTTP, you may expect  that all the
343packets coming from or going to  port 80 will be decoded as such. This
344is simply done that way:
345
346    bind_layers( TCP, HTTP, sport=80 )
347    bind_layers( TCP, HTTP, dport=80 )
348
349
350That's  all folks  !  Now every  packet  related to  port  80 will  be
351associated to the  layer HTTP, whether it is read from  a pcap file or
352received from the network.
353
354* the guess_payload_class() way
355
356Sometimes,  guessing the payload  class is  not as  straightforward as
357defining a single  port. For instance, it can depends on  a value of a
358given byte in the current layer. The 2 needed methods are:
359
360  - guess_payload_class() which must return  the guessed class for the
361    payload (next layer). By default, it uses links between classes
362    that have been put in place by bind_layers().
363
364  - default_payload_class()  which returns  the  default value.   This
365    method  defined in the  class Packet  returns Raw,  but it  can be
366    overloaded.
367
368For  instance, decoding  802.11  changes depending  on  whether it  is
369ciphered or not:
370
371    class Dot11(Packet):
372        def guess_payload_class(self, payload):
373            if self.FCfield & 0x40:
374                return Dot11WEP
375            else:
376                return Packet.guess_payload_class(self, payload)
377
378Several comments are needed here:
379
380- this  cannot be  done  using  bind_layers()  because the  tests  are
381  supposed to be "field==value", but it is more complicated here as we
382  test a single bit in the value of a field.
383- if the  test fails, no assumption is  made, and we plug  back to the
384  default guessing mechanisms calling Packet.guess_payload_class()
385
386Most of  the time,  defining a method  guess_payload_class() is  not a
387necessity as the same result can be obtained from bind_layers().
388
389* Changing the default behavior
390
391If you do not like scapy's  behavior for a given layer, you can either
392change or disable it through  the call to split_layer(). For instance,
393if you do not want UDP/53 to be bound with DNS, just add in your code:
394
395split_layers(UDP, DNS, sport=53)
396
397Now every packet  with source port 53 will not be  handled as DNS, but
398whatever you specify instead.
399
400
401
402* Under the hood : putting everything together
403==============================================
404
405In  fact, each  layer  has a  field  payload_guess. When  you use  the
406bind_layers() way, it adds the defined next layers to that list.
407
408    >>> p=TCP()
409    >>> p.payload_guess
410    [({'dport': 2000}, <class 'scapy.Skinny'>), ({'sport': 2000}, <class
411    'scapy.Skinny'>), ... )]
412
413Then,  when it  needs to  guess  the next  layer class,  it calls  the
414default method Packet.guess_payload_class().  This method runs through
415each  element  of  the   list  payload_guess,  each  element  being  a
416tuple:
417  - the 1st value is a field to test ('dport': 2000)
418  - the 2nd value is the guessed class if it matches (Skinny)
419
420So, the  default guess_payload_class() tries all element  in the list,
421until  one   matches.  If  no   element  are  found,  it   then  calls
422default_payload_class(). If you have redefined this method, then yours
423is  called, otherwise,  the default  one is  called, and  Raw  type is
424returned.
425
426  Packet.guess_payload_class()
427    - test what is in field guess_payload
428    - call overloaded guess_payload_class()
429
430
431============
432= Building =
433============
434
435Building a packet is as simple as building each layer. Then, some
436magic happens to glue everything. Let's do magic then.
437
438* The basic stuff
439=================
440
441First thing to  establish: what does "build" mean? As  we have seen, a
442layer  can   be  represented  in  different   ways  (human,  internal,
443machine). Building means going to the machine format.
444
445Second thing to  understand is _when_ a layer is  built. Answer is not
446that obvious, but as soon  as you need the machine representation, the
447layers are built: when the packet is dropped on the network or written
448to a  file, when it  is converted as  a string, ...  In  fact, machine
449representation  should be  regarded as  a big  string with  the layers
450appended altogether.
451
452    >>> p = IP()/TCP()
453    >>> hexdump(p)
454    0000 45 00 00 28 00 01 00 00 40 06 7C CD 7F 00 00 01 E..(....@.|.....
455    0010 7F 00 00 01 00 14 00 50 00 00 00 00 00 00 00 00 .......P........
456    0020 50 02 20 00 91 7C 00 00 P. ..|..
457
458Calling str() builds the packet:
459  - non instanced fields are set to their default value
460  - lengths are updated automatically
461  - checksums are computed
462 - and so on.
463
464In fact, using str() rather than  show2() or any other method is not a
465random  choice  as  all   the  functions  building  the  packet  calls
466Packet.__str__(). However, __str__() calls another method: build():
467
468    def __str__(self):
469        return self.__iter__().next().build()
470
471What is important also to understand  is that usually, you do not care
472about the machine  representation, that is why the  human and internal
473representations are here.
474
475So, the  core method is build()  (the code has been  shortened to keep
476only the relevant parts):
477
478    def build(self,internal=0):
479        pkt = self.do_build()
480        pay = self.build_payload()
481        p = self.post_build(pkt,pay)
482        if not internal:
483            pkt = self
484            while pkt.haslayer(Padding):
485                pkt = pkt.getlayer(Padding)
486                p += pkt.load
487                pkt = pkt.payload
488        return p
489
490So, it  starts by  building the current  layer, then the  payload, and
491post_build()  is called  to update  some late  evaluated  fields (like
492checksums). Last, the padding is added to the end of the packet.
493
494Of  course, building  a layer  is  the same  as building  each of  its
495fields, and that is exactly what do_build() does.
496
497* Building fields
498=================
499
500The building of each field of a layer is called in Packet.do_build():
501
502    def do_build(self):
503        p=""
504        for f in self.fields_desc:
505            p = f.addfield(self, p, self.getfieldval(f))
506        return p
507
508The  core function  to  build a  field  is addfield().   It takes  the
509internal view of the  field and put it at the end  of p. Usually, this
510method calls  i2m() and returns something  like p.self.i2m(val) (where
511val=self.getfieldval(f)).
512
513If val is set, then i2m() is just a matter of formatting the value the
514way it must  be. For instance, if a  byte is expected, truct.pack("B",
515val) is the right way to convert it.
516
517However, things  are more complicated if  val is not set,  it means no
518default  value was  provided  earlier,  and thus  the  field needs  to
519compute some "stuff" right now or later.
520
521"Right now"  means thanks  to i2m(), if  all pieces of  information is
522available.  For instance,  if  you have  to  handle a  length until  a
523certain delimiter.
524
525Ex: counting the length until a delimiter
526
527    class XNumberField(FieldLenField):
528   
529        def __init__(self, name, default, sep="\r\n"):
530            FieldLenField.__init__(self, name, default, fld)
531            self.sep = sep
532   
533        def i2m(self, pkt, x):
534            x = FieldLenField.i2m(self, pkt, x)
535            return "%02x" % x
536
537        def m2i(self, pkt, x):
538            return int(x, 16)
539   
540        def addfield(self, pkt, s, val):
541            return s+self.i2m(pkt, val)
542   
543        def getfield(self, pkt, s):
544            sep = s.find(self.sep)
545            return s[sep:], self.m2i(pkt, s[:sep])
546
547
548In this example,  in i2m(), if x has already a  value, it is converted
549to its hexadecimal value. If no value is given, a length of "0" is
550returned.
551
552The glue is provided by Packet.do_build() which calls Field.addfield()
553for  each field in  the layer,  which in  turn calls  Field.i2m(): the
554layer is built IF a value was available.
555
556
557* Handling default values: post_build
558=====================================
559
560A default  value for a  given field is  sometimes either not  known or
561impossible to compute when the  fields are put together. For instance,
562if we used a XNumberField as  defined previously in a layer, we expect
563it  to be set  to a  given value  when the  packet is  built. However,
564nothing is returned by i2m() if it is not set.
565
566The answer to this problem is Packet.post_build().
567
568When  this method is  called, the  packet is  already built,  but some
569fields still need  to be computed. This is  typically what is required
570to compute checksums or lengths. In fact, this is required each time a
571field's value depends on something which is not in the current
572
573So, let  us assume we  have a packet  with a XNumberField, and  have a
574look to its building process:
575
576
577class Foo(Packet):
578      fields_desc = [
579          ByteField("type", 0),
580          XNumberField("len", None, "\r\n"),
581          StrFixedLenField("sep", "\r\n", 2)
582          ]
583       
584      def post_build(self, p, pay):
585        if self.len is None and pay:
586            l = len(pay)
587            p = p[:1] + hex(l)[2:]+ p[2:]
588        return p+pay
589
590
591When post_build() is called, p  is the current layer, pay the payload,
592that is what has already been built. We want our length to be the full
593length of the data put after  the separator, so we add its computation
594in post_build().
595
596>>> p = Foo()/("X"*32)
597>>> p.show2()
598###[ Foo ]###
599  type= 0
600  len= 32
601  sep= '\r\n'
602###[ Raw ]###
603     load= 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
604
605len is correctly computed now
606
607>>> hexdump(str(p))
6080000   00 32 30 0D 0A 58 58 58  58 58 58 58 58 58 58 58   .20..XXXXXXXXXXX
6090010   58 58 58 58 58 58 58 58  58 58 58 58 58 58 58 58   XXXXXXXXXXXXXXXX
6100020   58 58 58 58 58                                     XXXXX
611
612And the machine representation is the expected one.
613
614
615* Handling default values: automatic computation
616================================================
617
618As we have previously seen, the dissection mechanism is built upon the
619links between  the layers created  by the programmer. However,  it can
620also be used during the building process.
621
622In the  layer Foo(), our  first byte is  the type, which  defines what
623comes next, e.g. if type=0, next layer is Bar0, if it is 1, next layer
624is  Bar1,  and  so on.  We  would  like  then  this  field to  be  set
625automatically according to what comes next.
626
627class Bar1(Packet):
628    fields_desc = [
629          IntField("val", 0),
630          ]
631
632class Bar2(Packet):
633    fields_desc = [
634          IPField("addr", "127.0.0.1")
635          ]
636
637If we use  these classes with nothing else, we  will have trouble when
638dissecting the  packets as nothing  binds Foo layer with  the multiple
639Bar*:
640
641    >>> p = Foo()/Bar1(val=1337)
642    >>> p
643    <Foo  |<Bar1  val=1337 |>>
644    >>> p.show2()
645    ###[ Foo ]###
646      type= 0
647      len= 4
648      sep= '\r\n'
649    ###[ Raw ]###
650        load= '\x00\x00\x059'
651
652Problems:
653  1. type is still  equal to 0 while we wanted  it to be automatically
654     set to 1. We could of course have built p with
655         p = Foo(type=1)/Bar0(val=1337)
656     but this is not very convenient.
657  2. the packet is badly dissected as Bar1 is regarded as Raw. This
658     is because no links have been set between Foo() and Bar*().
659
660
661As previously, we use bind_layers() to set everything correctly for us:
662    bind_layers( Foo, Bar1, type=1 )
663    bind_layers( Foo, Bar2, type=2 )
664
665Now, all the magic is there:
666
667    >>> p = Foo()/Bar1(val=0x1337)
668    >>> p
669    <Foo  type=1 |<Bar1  val=4919 |>>
670    >>> p.show2()
671    ###[ Foo ]###
672      type= 1
673      len= 4
674      sep= '\r\n'
675    ###[ Bar1 ]###
676        val= 4919L
677
678Our 2 problems have been solved without us doing much: so good to be
679lazy :)
680
681* Under the hood : putting everything together
682==============================================
683
684Last but not least, it is very useful to understand when each function
685is called when a packet is built:
686
687>>> hexdump(str(p))
688Packet.str=Foo
689Packet.iter=Foo
690Packet.iter=Bar1
691Packet.build=Foo
692Packet.build=Bar1
693Packet.post_build=Bar1
694Packet.post_build=Foo
695
696As you can see, it first runs through the list of each field, and then
697build  them starting  from the  beginning. Once  all layers  have been
698built, it then calls post_build() starting from the end.
699
700
701
702===========
703= History =
704===========
705
706$Log: dissect.txt,v $
707Revision 1.1  2007/01/31 10:58:21  raynal
708Initial revision
709
710