Discussion:
ssh connection not working when ssh server is behind a linux bridge
(too old to reply)
Adrian Pascalau
2018-01-28 20:15:34 UTC
Permalink
Hi,

I have a strange issue with a linux bridge and a openssh server
running in a VM connected to that bridge. Basically when I ssh with
Putty or any other windows based ssh client to an openssh server
running in a centos VM connected to the external network through a
linux bridge, the ssh connection hangs before the login prompt is
shown.

What I have learned is that when an Ethernet frame that is less then
60 bytes in size goes through the network, it is padded with 0x00
bytes until it has 60 bytes in length (64 with the frame check
sequence). When this kind of padded Ethernet frame goes from the
openssh server in my VM through the linux bridge to a windows host
where the Putty ssh client is, the IP and TCP headers in those frames
wrongly consider the 0x00 padded bytes as part of the IP / TCP user
data, therefore the upstream protocol (SSH in my case) tries to
interpret them, and the ssh session hangs.

My understanding is that those 0x00 padded bytes are at the layer2
Ethernet frame level, and should not be considered in the user data of
the higher level protocols. About the padding bytes I have found some
info here: https://wiki.wireshark.org/Ethernet#Allowed_Packet_Lengths

So I suspect this behavior is caused by the linux bridge, because
without the linux bridge, the ssh connection works without any issue.
What I mean is that if I ssh to the host operating system, the ssh
connection works without any issue, however if I ssh to the centos VM
that is running in that host operating system and that uses the linux
bridge for external network access, then I have this behavior that I
describe above. With other words, only when the linux bridge is in the
path of the ssh packets, this issue happens.

Now, my centos VM where I have this ssh issue is managed by libvirt,
and the bridge is in forwarding mode and created by hand. If the same
centos VM is migrated into an all-in-one OpenStack Pike host, where
the bridge is managed by neutron, the ssh connection works again
without any issue. In all cases I am talking about the latest centos
openssh server, and the default ssh server configuration file.

So, what do you think? Where should I look further to understand what
exactly causes this behavior?

Many thanks,
Adrian
Stephen Hemminger
2018-01-28 22:28:26 UTC
Permalink
On Sun, 28 Jan 2018 22:15:34 +0200
Post by Adrian Pascalau
Hi,
I have a strange issue with a linux bridge and a openssh server
running in a VM connected to that bridge. Basically when I ssh with
Putty or any other windows based ssh client to an openssh server
running in a centos VM connected to the external network through a
linux bridge, the ssh connection hangs before the login prompt is
shown.
What I have learned is that when an Ethernet frame that is less then
60 bytes in size goes through the network, it is padded with 0x00
bytes until it has 60 bytes in length (64 with the frame check
sequence). When this kind of padded Ethernet frame goes from the
openssh server in my VM through the linux bridge to a windows host
where the Putty ssh client is, the IP and TCP headers in those frames
wrongly consider the 0x00 padded bytes as part of the IP / TCP user
data, therefore the upstream protocol (SSH in my case) tries to
interpret them, and the ssh session hangs.
My understanding is that those 0x00 padded bytes are at the layer2
Ethernet frame level, and should not be considered in the user data of
the higher level protocols. About the padding bytes I have found some
info here: https://wiki.wireshark.org/Ethernet#Allowed_Packet_Lengths
So I suspect this behavior is caused by the linux bridge, because
without the linux bridge, the ssh connection works without any issue.
What I mean is that if I ssh to the host operating system, the ssh
connection works without any issue, however if I ssh to the centos VM
that is running in that host operating system and that uses the linux
bridge for external network access, then I have this behavior that I
describe above. With other words, only when the linux bridge is in the
path of the ssh packets, this issue happens.
Now, my centos VM where I have this ssh issue is managed by libvirt,
and the bridge is in forwarding mode and created by hand. If the same
centos VM is migrated into an all-in-one OpenStack Pike host, where
the bridge is managed by neutron, the ssh connection works again
without any issue. In all cases I am talking about the latest centos
openssh server, and the default ssh server configuration file.
So, what do you think? Where should I look further to understand what
exactly causes this behavior?
Many thanks,
Adrian
These symptoms sound like an MTU mismatch.
The padding is not related. More likely, the issue is that one side
is sending a larger frame than the MTU of the underlying interface.
Since the bridge is a pure layer 2 interface, it has not choice
but to drop any frame where the size is greater than the MTU.
Seweryn Niemiec
2018-01-29 09:42:19 UTC
Permalink
Post by Stephen Hemminger
These symptoms sound like an MTU mismatch.
The padding is not related. More likely, the issue is that one side
is sending a larger frame than the MTU of the underlying interface.
Since the bridge is a pure layer 2 interface, it has not choice
but to drop any frame where the size is greater than the MTU.
I have a similar problem (same infrastructure and same symptoms) but with random
occurrence. Ssh session hangs randomly, usually before the login prompt is
shown), sometimes a bit later, sometimes after few days. I have MTU 1500 on all
interfaces taking part in communication. Ping of any size works and as far as I
tested, HTTP communication too, but there are problems with HTTPS.
--
Best regards,
Seweryn Niemiec
Stephen Hemminger
2018-01-29 20:38:48 UTC
Permalink
On Mon, 29 Jan 2018 10:42:19 +0100
Post by Seweryn Niemiec
Post by Stephen Hemminger
These symptoms sound like an MTU mismatch.
The padding is not related. More likely, the issue is that one side
is sending a larger frame than the MTU of the underlying interface.
Since the bridge is a pure layer 2 interface, it has not choice
but to drop any frame where the size is greater than the MTU.
I have a similar problem (same infrastructure and same symptoms) but with random
occurrence. Ssh session hangs randomly, usually before the login prompt is
shown), sometimes a bit later, sometimes after few days. I have MTU 1500 on all
interfaces taking part in communication. Ping of any size works and as far as I
tested, HTTP communication too, but there are problems with HTTPS.
If you see that, it usually is some middlebox in the way intefering with packets.
Adrian Pascalau
2018-01-29 07:51:35 UTC
Permalink
This post might be inappropriate. Click to display it.
Adrian Pascalau
2018-01-29 09:37:40 UTC
Permalink
On Mon, Jan 29, 2018 at 9:51 AM, Adrian Pascalau
Post by Adrian Pascalau
In this 54 bytes TCP ACK frame, the Ethernet II header is 14 bytes
long, the IP header is 20 bytes long, the TCP header is another 20
bytes in length, and there is no TCP payload, so in total 54 bytes.
When this frame arrives in the client side, it is 60 bytes in length,
and it is interpreted by the Wireshark as a SSHv2 frame, because of
the 6 additional 0x00 padding bytes in the TCP payload. I find this
exact frame in a working ssh session, and those 6 additional 0x00
padding bytes are correctly shows at the Ethernet II frame level.
Ok, so I found a workaround for this, even if I do not know who caused
this issue.

Basically I noticed that I have this ssh connection issue only when
the ssh client runs on a windows host. If the ssh client runs on a
linux host, the ssh connection works without any problem. So I have
compared the tcpdump for ssh connections initiated from both windows
and linux, and what I have noticed is that on centos linux, by default
the TCP stack uses timestamps in the TCP Options, and because of this,
the Ethernet frames are never below 60 bytes, while in my windows the
TCP Options timestamps are not used, and therefore some Ethernet
frames are less than 60 bytes.

So I enabled the TCP Options timestamps in windows as well, by running
the command 'netsh int tcp set global timestamps=enabled', and just
like that, the ssh started to work. Still, I do not know who is
causing this issue, and who to blame for this behavior...

Any suggestion how to identify which network element wrongly assigns
the Ethernet padding to the TCP payload is more than welcome.
Continue reading on narkive:
Loading...