This defect has been investigated, a potential explanation found, and a suggested correction is given.
Please pardon my bad english, I'm french. (Like scapy author BTW, he he ! -censored patriotic claims-)
1) Context
Here at work we have a linux box (ubuntu "edgy", kernel 2.6.17-11, python 2.4) with several network interfaces with different addresses.
This machine is remotely controlled with data flowing through one interface.
We use UTscapy to unit test a box connected on another interface.
2) The problem
When sniffing with scapy on one interface, we seldom grab frames from another interface.
This is extremely embarrassing since these are unit tests (UTscapy), and unexpected frames are considered a failure.
Several instances of wireshark running on each interfaces do NOT show the spurious frames arriving on the other interface, so it is really a capture problem, not a bad network configuration.
3) Reproducing the problem
We have traced the problem to scapy himself. It can be easily reproduced with only scapy :
- set up a machine with two network interfaces, and maintain a constant, maximum network flow on one interface (say eth1)
- ensure there is no network flow on the other interface (say eth0)
- launch scappy and launch a capture on eth0 with this command :
Welcome to Scapy (v1.1.1 / -)
>>> for i in range(24) :
... print str(time.time())
... sniff(prn = lambda x: str(time.time()) + " " + x.summary(), iface='eth0', timeout=0.3)
...
note that the command is not trivial, it is tailored to maximize the probability of the bug occurence and to give interesting infos.
- sometimes, you'll see something like that : (log has been slightly edited)
...
1188550571.88
<Sniffed: UDP:0 TCP:0 ICMP:0 Other:0>
1188550572.21
<Sniffed: UDP:0 TCP:0 ICMP:0 Other:0>
1188550572.53
1188550572.55 Ether / IP / TCP [@]:x11_1 > [@]:39953 A
<Sniffed: UDP:0 TCP:1 ICMP:0 Other:0>
1188550572.86
<Sniffed: UDP:0 TCP:0 ICMP:0 Other:0>
1188550573.19
<Sniffed: UDP:0 TCP:0 ICMP:0 Other:0>
1188550573.52
1188550573.53 Ether / IP / TCP [@]:x11_1 > [@]:39953 A
<Sniffed: UDP:0 TCP:1 ICMP:0 Other:0>
1188550573.85
<Sniffed: UDP:0 TCP:0 ICMP:0 Other:0>
...
Here we can see that X11 frames seems to be leaking from eth1 to eth0.
Wireshark running on eth0 do not show those frames.
4) Investigating the problem
*** Warning : reading this may prevent you from having a fresh eye on the problem. You may want to investigate yourself first. ***
Internal knowledge of scapy is required.
While trying to reproduce the problem, we have found that the leaked frames always shows themselves at the very beginning of the capture.
That's why the previous command launch a lot of very small captures, to maximize the occurences of capture starts. It also dates the capture launches and frames receptions, to appreciate the time of each leak.
The results stated before made us think about a race condition. We specify an interface (eth0), but at which time does scapy takes this into account ? May it captures on every interface during a brief time ?
We then have searched at which point the specific interface is enforced.
We have traced back the problem to the L2ListenSocket used by scapy to sniff() frames. We saw that :
(scapy latest 31/08/2007 : line 9400)
self.ins = socket.socket(socket.AF_PACKET, socket.SOCK_RAW, socket.htons(type))
self.ins.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 0)
if iface is not None:
self.ins.bind((iface, type))
... (stuff) ...
self.ins.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 2**30)
The specific capture interface is not enforced immediatly. The setting of SO_RCVBUF to 0 is obviously directed at preventing reception of data while initializing the socket.
The interface is then enforced (bind) and later, the RCVBUF is enlarged, allowing data reception again.
It is obviously possible that, between the socket creation and the setting of RCVBUF size to 0, some data may arrives in the socket, especially if the network load is heavy.
Now, according that putting RCVBUF to 0 stop reception, does it also flush the reception buffer ? Are we sure that no data is available when enlarging RCVBUF ?
We have instrumented scapy by modifying its source code :
self.ins = socket.socket(socket.AF_PACKET, socket.SOCK_RAW, socket.htons(type))
self.ins.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 0)
if iface is not None:
self.ins.bind((iface, type))
... (stuff) ...
>>> debugCheckIfSocketIsReallyEmpty(self.ins) # YEJ 2007/08/31 XXX debug, to remove
self.ins.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 2**30)
And we have found that, indeed, the socket was not always empty at this time. We can reasonably state that putting RCVBUF size to 0 does NOT flush RCVBUF. When enlarging it again, previously received and unread data is available again.
So there is a brief time in the socket creation where it capture frames from every interfaces. This is an unexpected thing and it must be corrected.
L3PacketSocket, L2Socket and L2ListenSocket are affected.
5) Suggested correction
Warning : at request of our testing team, I have learnt python and scapy only in order to investigate this bug. Do not expect great code !
Assuming that putting RCVBUF size to 0 really stop reception, the correction is to flush the socket from data that may have arrived between its creation and the setting of its receive buffer size to 0.
First, a function to do that (since it will be used in the 3 LxxxSocket?)
def flushSocket(s):
"""Use this function to flush a receivingsocket.
This function will read data until no more data is present.
This function must be used on a socket wich don't receive data anymore (buffsize = 0 for example)
or this it may loop indefinitely, and return a socket which has already refilled itself.
This function is needeed because it looks like puting socket receive buffer to 0 don't flush it,
so when the buffer is enlarged, previous unread data is accessible.
"""
while True: # loop
sel = select([s],[],[],0) # data check. timeout = 0 => immediate return
if s in sel[0]: # is there data ?
print "(flushing socket)" # debug, show that this function was needed
aFrame = s.recv(MTU) # we consume the data
else:
break; # no more data => exit
And then, the socket initialisation is enriched :
self.ins = socket.socket(socket.AF_PACKET, socket.SOCK_RAW, socket.htons(type))
self.ins.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 0)
>>> flushSocket(self.ins) # YEJ 2007/08/31 because putting RCVBUF to 0 don't flush the socket, data is still here and will be accessible when RCVBUF expands again
if iface is not None:
self.ins.bind((iface, type))
... (stuff) ...
>>> debugCheckIfSocketIsReallyEmpty(self.ins) # YEJ 2007/08/31 XXX debug, to remove
self.ins.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 2**30)
By restarting test 3), we have found that the problem don't seem to happen anymore (but maybe it only happen less frequently ;-) let's hope)
1188550857.07
<Sniffed: UDP:0 ICMP:0 TCP:0 Other:0>
1188550857.4
(flushing socket)
<Sniffed: UDP:0 ICMP:0 TCP:0 Other:0>
1188550857.73
<Sniffed: UDP:0 ICMP:0 TCP:0 Other:0>
1188550858.06
(flushing socket)
<Sniffed: UDP:0 ICMP:0 TCP:0 Other:0>
1188550858.38
<Sniffed: UDP:0 ICMP:0 TCP:0 Other:0>
1188550858.71
<Sniffed: UDP:0 ICMP:0 TCP:0 Other:0>
We can see that the correction is effective, the flush is seldom activated and effectively remove spurious frames.
6) Diff
Warning : of course, this not to be taken "as is", debug code should be removed.
mistral@mistral3:~/data/clearcase/msg_yej_dev_v001_MSG_snap/msg/_docs/STD/Scapy$ diff scapy.py scapy_corrected.py
9219a9220,9249
> # YEJ 2007/08/31 start
> # looks where those functions are used to understand
> def flushSocket(s):
> """Use this function to flush a receivingsocket.
> This function will read data until no more data is present.
> This function must be used on a socket wich don't receive data anymore (buffsize = 0 for example)
> or this it may loop indefinitely, and return a socket which has already refilled itself.
> This function is needeed because it looks like puting socket receive buffer to 0 don't flush it,
> so when the buffer is enlarged, previous unread data is accessible.
> """
> while True: # loop
> sel = select([s],[],[],0) # timeout = 0 => immediate return
> if s in sel[0]: # is there data ?
> #print "(flushing socket)" # debug, show that this function was needed
> aFrame = s.recv(MTU) # we consume the data
> else:
> break; # no more data => exit
>
> def debugCheckIfSocketIsReallyEmpty(s):
> """This function is a test function in order to debug scapy sockets
> It prints errors if the socket contains data.
> """
> # check if there is data (recv() with flag MSG_DONTWAIT is not supported on all platforms)
> sel = select([s],[],[],0) # timeout = 0 => immediate return
> if s in sel[0]:
> aFrame = s.recv(MTU, socket.MSG_PEEK) # PEEK to not consume data. We only look at it.
> if aFrame is not None:
> log_runtime.error("unprepared socket has already data !")
> print "XXX buuuuug (1) !!!! XXX"
> # YEJ 2007/08/31 end
9266,9267c9296,9299
< self.ins = socket.socket(socket.AF_PACKET, socket.SOCK_RAW, socket.htons(type))
< self.ins.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 0)
---
> self.ins = socket.socket(socket.AF_PACKET, socket.SOCK_RAW, socket.htons(type)) # socket is created and can immediatly receive data
> # data can arrive now, especially if there is a lot of traffic
> self.ins.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 0) # socket cannot receive any more, but data can have been received between creation and now
> flushSocket(self.ins) # YEJ 2007/08/31 because putting RCVBUF to 0 don't flush the socket, data is still here and will be accessible when RCVBUF expands again
9275a9308
> debugCheckIfSocketIsReallyEmpty(self.ins) # YEJ 2007/08/31 XXX debug, to remove
9359a9393
> flushSocket(self.ins) # YEJ 2007/08/31 because putting RCVBUF to 0 don't flush the socket, data is still here and will be accessible when RCVBUF expands again
9368a9403
> debugCheckIfSocketIsReallyEmpty(self.ins) # YEJ 2007/08/31 XXX debug, to remove
9401a9437
> flushSocket(self.ins) # YEJ 2007/08/31 because putting RCVBUF to 0 don't flush the socket, data is still here and will be accessible when RCVBUF expands again
9424a9461
> debugCheckIfSocketIsReallyEmpty(self.ins) # YEJ 2007/08/31 XXX debug, to remove
7) Shame and credits
Thanks to scapy author for such a great tool.
Thanks for AGR who has spotted this bug.
Thanks to everyone who has read this until the end.
Shame to me if this bug is already corrected.
Shame to me if I've missed something.
Thales aide EADS ;-) Non, vous ne nous rachèterez pas !