Re: Kernel panic due to NF_IP_LOCAL_OUT handler calling itself again

From: jimxu (jingx_at_nj.sc.mcel.mot.com)
Date: 03/03/05


Date: Thu, 3 Mar 2005 16:16:17 +0800

When you receive the icmp dest unreach message, do you drop it or inform the
kernel?
and can you reset the atomic_t variable?
try to stop local_out_handler when receive icmp dest unreach message by set
a flag.

-- 
jimxu
*NO PAINS, NO GAINS*
"Morfean" <vinayvinay@gmail.com> wrote in message
news:1109660437.001409.18580@g14g2000cwa.googlegroups.com...
> Hi,
> I am writing an implementation of a source routing protocol as a
> loadable module. I am using netfilter, I am not using the IP SSR
> option, I am using kernel 2.6.5, without smp and preemption support. My
> design is based on the DSR protocol. I have a header after the IP
> header, describing the source route and the route error. If 's' is the
> src, 'd' is the dst, and x1,x2 ... are the hops, the source route is :
> x1-x2...d. However unlike SSR I don't change the dst field of the ip
> header, which is set to d. Also every src routed packet carries with it
> an ack request for the next hop, to which the next hop is supposed to
> reply to.
> The jist of my code is as follows:
>
> pre_route_handler:
> For all source routed packets:
> ackReply to previous hop
> nxtHop = getNextHop(packet);
> if(ip_route_input(myaddr,nxtHop)!=0){
> drop packet;
> }
> awaitAck(nxtHop);
> return NF_ACCEPT.
>
> In local_out_handler:
> create a packet with the source route. route it to the first hop,
> using ip_route_output_key(flowi,skb). I then do an awaitAck(firstHop).
>
> I intend to use this protocol in wireless networks. And hence I have
> implemented an ack based design, where every node is responsible for
> ensuring that the packet makes it to the next hop. For this the
> function awaitAck does the following:
>
> awaitAck(skb):
> add an ackRequest header.
> add a timer(a pointer to it) in the skb->cb.
> put this skb to a queue.
>
> When this timer fires:
> if rtxCount for the skb less than a MAX_RXMT, retransmit, else send a
> route error on the reverse route to the source.
>
> This scheme is working fine when there are no errors. I am testing it
> across an ethernet lan by pinging. However when there are link
> failures, weird things happen.
> The topology I am using is A - - - B, where B is some fictional node
> not connected to A, but in the same subnet (the routing table so
> configured to put such packets to eth0). In the local out handler, I do
> a ip_route_output to B, which succeeds. I call the output function
> okfn, given to me by netfilter directly and return NF_STOLEN. The next
> packet to come to me in local_out is an icmp dest unreach. It is
> destined to me, so I accept it. The next packet is again an icmp dest
> unreach, and after that somehow my local_out_handler is called again,
> while the first call of it hasn't finished. (My kernel is not smp and
> not preemptible). At times this happens over and over. My kernel then
> panics, either due to a stack overflow, or some bad eip value, or
> something else (with eip value not decoded, and nowhere in the
> /proc/kallsyms). I detect this double calling of my local_out_handler
> by using an atomic_t variable. The same effect is seen if I return an
> NF_DROP or NF_ACCEPT on the skb in the reentrant call. I have also used
> spinlocks but the kernel always crashes.
>
> After some poking around the kernel code, I found that if an arp entry
> to a node is not present, an entry is created in the arptable with
> status set to pending. All packets waiting for this arp request to be
> resolved are queued up. When the arp request ultimately fails(times
> out), the packets in this queue are freed and an icmp destination
> unreachable error is sent back to the source of these packets.
>
> But I are still unable to figure out as to why EIP is getting
> corrupted(or the reason for stack overflow).
>
> I am quite helpless, I have posted on several mailing lists(linux-net,
> kernelnewbies, linux-kernel), but haven't received any replies, Also I
> couldn't find any decent material on the web related to my problem.
> I would be obliged if someone could please help me with this. What am I
> doing wrong, where can I find information on this?
> My code is completely based on Alex Song's DSR implementation for linux
> 2.4, available online at http://piconet.sf.net.
>
> Regards,
> Vinay Reddy
>


Relevant Pages