www.smartbusinesschoices.com

Leading Business and Technology,
News and information


Part of the Identityscape.com network...

getxfactor.com jmoodmusic.com smartbusinesschoices.com mintdepot.com lowfaresalways.com evangelicalview.com shoppingpodder.com soproudlywehail.com webnews.ws currenthumor.com

 

 

distributed measurement problem
   Smart Linux Business Choices! - the Best of UseNet Postings! Forum Index -> Linux Networking  
View previous topic :: View next topic  
Author Message
Shashank
Guest






PostPosted: Mon Nov 03, 2008 11:44 pm    Post subject: distributed measurement problem Reply with quote

Hi,

I am working on a distributed measurement project with a centralized
data collection node (server) and 28 clients with different number of
interfaces(1-4).

I've written C code that captures packets on all the interfaces on a
node(on which it runs), gets statistics(pps, Mbps etc for different
subsets of traffic), and sends it to the server every second. The
server basically creates a file for each interface on each client and
writes these statistics into the respective files.

I've used python to automate and synchronize, so it basically runs the
C program in the background on each of the interfaces.

The problem is:
If I initiate the client program to run for, say 200 seconds, the
clients run for the entire period sending statistics per second to the
server. However, files corresponding to some interfaces do not show
the entire 200 seconds even though the client finishes execution and
the server closes the file after the client has finished execution.

I don't think this is an issue with the server being flooded with data
(its multithreaded and the below example was run one node at a time)
or about packets being dropped(doesn't make sense for this problem
plus ifconfig doesnt show dropped packets and I am using TCP sockets
as well). I am not sure whether there is a bug in my code, since its
essentially the same client code on all systems.

Here is the wc -l execution on three nodes run one at a time for 200
seconds:

Quote:
wc -l *.log
44 core1.10.1.11.2.log

200 core1.10.1.3.2.log
49 core1.10.1.32.3.log
200 core1.10.1.9.2.log
49 core2.10.1.13.2.log
49 core2.10.1.15.2.log
200 core2.10.1.3.3.log
200 core2.10.1.5.2.log
49 core3.10.1.17.2.log
200 core3.10.1.18.2.log
200 core3.10.1.30.3.log
200 core3.10.1.5.3.log
1640 total

Each has 4 interfaces on it, and although the experiment ran for 200
seconds, some show about 44 or 49 lines on it. ifconfig on the server
shows no dropped packets.

Does anyone have pointers on this?
Sorry for the long post,

Thanks,
Shashank
Back to top
David Schwartz
Guest






PostPosted: Tue Nov 04, 2008 2:21 am    Post subject: Re: distributed measurement problem Reply with quote

On Nov 3, 3:44 pm, Shashank <shashank.shanb...@gmail.com> wrote:

Quote:
The problem is:
If I initiate the client program to run for, say 200 seconds, the
clients run for the entire period sending statistics per second to the
server. However, files corresponding to some interfaces do not show
the entire 200 seconds even though the client finishes execution and
the server closes the file after the client has finished execution.

This doesn't fit the pattern for any "typical mistake" that I'm
familiar with. I'd suggest trying to localize the problem bit by bit.

For example, first modify the client software to checkpoint how many
reports it has sent to the server. Have a client log file, and have it
write a 'checkpoint' after every ten messages. Open the log file in
append mode, assemble the checkpoint message in a buffer, and send it
with a single call to 'write'. If the checkpoints don't show the 200
messages, then you know the client is the issue.

Then add similar checkpointing in the software that talks to the
client. Make sure the server software sees 200 messages. If not, then
you know something is screwy in that piece of software. (Perhaps the
client isn't really sending the messages? Perhaps the server is
dropping some of them?)

Keep going until you localize the problem.

DS
Back to top
Joe Beanfish
Guest






PostPosted: Wed Nov 05, 2008 12:21 am    Post subject: Re: distributed measurement problem Reply with quote

David Schwartz wrote:
Quote:
On Nov 3, 3:44 pm, Shashank <shashank.shanb...@gmail.com> wrote:

The problem is:
If I initiate the client program to run for, say 200 seconds, the
clients run for the entire period sending statistics per second to the
server. However, files corresponding to some interfaces do not show
the entire 200 seconds even though the client finishes execution and
the server closes the file after the client has finished execution.

This doesn't fit the pattern for any "typical mistake" that I'm
familiar with. I'd suggest trying to localize the problem bit by bit.

For example, first modify the client software to checkpoint how many
reports it has sent to the server. Have a client log file, and have it
write a 'checkpoint' after every ten messages. Open the log file in
append mode, assemble the checkpoint message in a buffer, and send it
with a single call to 'write'. If the checkpoints don't show the 200
messages, then you know the client is the issue.

Then add similar checkpointing in the software that talks to the
client. Make sure the server software sees 200 messages. If not, then
you know something is screwy in that piece of software. (Perhaps the
client isn't really sending the messages? Perhaps the server is
dropping some of them?)

Keep going until you localize the problem.

DS

Also timestamp your messages and look to see which ones are missing.
That may give you a clue of where to look for the problem.
Back to top
Shashank
Guest






PostPosted: Tue Nov 11, 2008 8:34 am    Post subject: Re: distributed measurement problem Reply with quote

On Nov 4, 1:21 pm, Joe Beanfish <j...@nospam.duh> wrote:
Quote:
David Schwartz wrote:
On Nov 3, 3:44 pm, Shashank <shashank.shanb...@gmail.com> wrote:

The problem is:
If I initiate the client program to run for, say 200 seconds, the
clients run for the entire period sending statistics per second to the
server. However, files corresponding to some interfaces do not show
the entire 200 seconds even though the client finishes execution and
the server closes the file after the client has finished execution.

This doesn't fit the pattern for any "typical mistake" that I'm
familiar with. I'd suggest trying to localize the problem bit by bit.

For example, first modify the client software to checkpoint how many
reports it has sent to the server. Have a client log file, and have it
write a 'checkpoint' after every ten messages. Open the log file in
append mode, assemble the checkpoint message in a buffer, and send it
with a single call to 'write'. If the checkpoints don't show the 200
messages, then you know the client is the issue.

Then add similar checkpointing in the software that talks to the
client. Make sure the server software sees 200 messages. If not, then
you know something is screwy in that piece of software. (Perhaps the
client isn't really sending the messages? Perhaps the server is
dropping some of them?)

Keep going until you localize the problem.

DS

Also timestamp your messages and look to see which ones are missing.
That may give you a clue of where to look for the problem.

Hello,

Thanks to both of you for the suggestions.
The problem was actually in one of the anomaly detection algorithms I
was using.
I have sorted the problem out.
Thanks.. Smile
Shashank
Back to top
Display posts from previous:   
   Smart Linux Business Choices! - the Best of UseNet Postings! Forum Index -> Linux Networking  
Page 1 of 1
All times are GMT

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum